Wandering Thoughts archives

2024-02-07

What I'd like in a hypothetical new desktop machine in 2024

My current work desktop and home desktop are getting somewhat long in the tooth, which has caused me to periodically think about what I'd want in new hardware for them. Sometimes I even look at potential hardware choices for such a replacement desktop (which can lead to grumbling). Today I want to write down my ideal broad specifications for such a new desktop, what I'd get if I could get it all in one spot for an affordable price.

In addition to all of the expected things (like onboard sound), I'd like:

  • 64 GB of RAM instead of my current 32 GB. It would be nice if it was ECC RAM in a system that genuinely supported it, and it would also be nice if it was fast, but those two attributes are often in opposition to each other.

    (Today I suspect this means choosing DDR5 over DDR4.)

  • Three motherboard M.2 NVMe drive slots. I'd like three because I currently have a mirrored pair of NVMe drives, and having a third slot would let me replace one of the live two without having to pull it outright. Two motherboard M.2 NVMe slots (both operating at PCIe x4) is probably my minimum these days, and I already have a PCIe M.2 NVMe card for the current work desktop.

    My work desktop has 500 GB NVMe drives currently and I'd like to get bigger ones. My home desktop is fine with its current drives.

  • At least four SATA ports and ideally more. My office desktop has two SSDs and a SATA DVD-RW drive (because we still sometimes use those), and I want to be able to run three SSDs at once while replacing one of the two SSDs. Six SATA ports would be better, so perhaps I should say I can live with four SATA ports but I'd like six.

    (My home desktop will also need three SATA ports on a routine basis with a fourth available for drive replacement, but that's for another entry.)

  • At least three 1G Ethernet ports for my work desktop. Since I don't think there are any reasonable desktop motherboards with this many Ethernet ports, this needs at least a dual-port PCIe card and perhaps a quad-port card, which I already have at work. It also needs a suitable PCIe slot to be free and usable given any other cards in the machine. My home desktop can get by with one port but I'd probably like to have two or three there too.

    (I wouldn't need that many but Linux's native virtualization works best if you give it its own network port.)

    Although various desktop motherboards have started offering speeds above 1G (although often not full 10G-T), our work wiring situation is such that there's no real prospect of taking advantage of that any time soon. But if a motherboard comes with '2.5G' or '5G' networking with a chipset that's decent and well supported by Linux, I wouldn't say no.

  • At least two DisplayPort and/or HDMI outputs that support at least 4K at 60 Hz, and I'd like more for future-proofing. I would prefer two DisplayPort outputs to a DisplayPort + HDMI pairing; this is readily available in GPU cards but not really in motherboards and integrated graphics. At work I currently have two 27" HiDPI displays and at home I currently have one; in both locations the biggest constraint on larger displays or more of them is physical space.

    (I'd love it if we were moving into a bright future of high resolution, high DPI, high refresh rate displays, but I don't think we are, so I don't really expect to want more than dual 4K at 60Hz for the next half decade or more. It's possible this is too pessimistic and there are viable 5K+ monitors that I might want at home in place of my current 27" 4K HiDPI display.)

  • Open source friendly graphics, which in practice excludes Nvidia GPUs (especially if I care about good Wayland support), and possibly the discrete Intel GPU cards (I'm not sure of their state). I think anything reasonably modern will support whatever OpenGL features Wayland needs or is likely to need. The easy way to get this might well be integrated graphics on a current generation CPU, assuming I can get the output ports that I want.

    On the other hand, the Intel ARC A380 seems to be okay on Linux (from some Internet searches), and while it has a fan it's alleged to be able to operate very quietly. It would give me the multiple DisplayPort outputs and high resolution, high refresh rate support.

  • A decent number of both USB-A and USB-C ports. I'd like a reasonable number of USB-A ports because I still have a lot of USB-A things and I'd like not to have a whole collection of USB-A hubs sitting around on my either my office or my home desk. But probably more hubs (or larger ones) is in my future.

I'd like it if the machine still supported old fashioned BIOS MBR booting and didn't require (U)EFI booting (I have my reasons), although UEFI booting is probably better on desktop motherboards than it used to be. The UEFI story for people who want booting from mirrored pairs of drives may be better on Fedora than it used to be, since Ubuntu 22.04 has some support for duplicate UEFI boot partitions.

(I'm absolutely not interested in trying to mirror the EFI System Partition behind the back of the UEFI BIOS.)

It would be nice to get a good CPU performance increase from my current desktops, but on the one hand I sort of assume that any decent desktop CPU today is going to be visibly better than something from more than five years ago, and on the other hand I'm not sure how noticeable the performance improvement is these days, and on the third hand I've been wrong before. If my current (five year old) desktops have reached the point where CPU performance mostly doesn't matter to me, then I'd probably prefer to get a midrange CPU with decent thermal performance and perhaps no funny slow 'efficiency' cores that can give you and Linux's kernel CPU scheduling various sorts of heartburn. On the other hand, my Firefox build times keep getting slower and slower, so I suspect that the world of software just assumes current CPUs and current good performance.

PS: I have no plans to do GPU computation on my desktops, for a variety of reasons including that I don't want to deal with Nvidia GPUs in my machines. If I need to do GPU stuff for work, our SLURM cluster has GPUs, and I don't have to care how much power they use, how noisy they are, and how much heat they put out because they're in the machine room (and I'm not).

linux/MyMachineDesires2024 written at 23:50:44; Add Comment

2024-02-06

What the max_connect Linux NFS v4 mount parameter seems to do

Suppose, not hypothetically, that you've converted your fleet from using NFS v3 to using basic Unix security NFS v4 mounts when they mount their hordes of NFS filesystems from your NFS fileservers. When your NFS clients boot or at some other times, you notice that you're getting a bunch of copies of a new kernel message:

SUNRPC: reached max allowed number (1) did not add transport to server: <IP address>

Modern NFS uses TCP, which means that the NFS client needs to make some number of TCP connections to each NFS server. In NFS v3, Linux normally only makes one connection to each server. The same is sort of true in NFS v4 as well, but NFS v4 is more complex about what is 'a server'. In NFS v3, servers are identified by at least their IP address (and perhaps their name; I'm not sure if two different names that map to the same IP will share the same connection). In NFS v4.1+, servers have some sort of intrinsic identity that is visible to clients even if you're talking to them by multiple IP addresses.

This new 'reached max allowed number (<N>) did not add transport to server' kernel message is reporting about this case. You (we) have a single NFS server that for historical reasons has two different IPs, one for most of its filesystems and one for our central administrative filesystem, and now NFS v4 considers these the 'same' server and won't make an extra connection to the second IP.

You might wonder if you can change this, and the answer is that you can but it gets complex and I'm not quite sure how it all works to distribute the actual NFS traffic. There appear to be two interlinked things that you can control; how many connections a NFS v4 client will make to a single NFS server, and how many different IPs of the server that NFS v4 client will connect to. How many connections NFS v4 will make to a single server is mostly controlled by nfs(5)'s nconnect setting, sort of like nconnect's behavior with NFS v3. How many connections NFS v4 will make to separate client IPs is controlled by 'max_connect'. Both of these default to 1. However, how they interact is confusing and I'm not sure I fully understand it.

The easy case is not setting nconnect and setting max_connect to at least as many different IP aliases as you have for each fileserver. In this case you'll get one TCP connection per server IP (although don't ask me what traffic flows over what connection). If you set nconnect without max_connect, you'll get however many connections to the first IP address of each server (well, the first IP address that the client finds), assuming that you mount at least that many NFS filesystems from that server.

However, if you set both nconnect and max_connect, what seems to happen (on Ubuntu 22.04) is that you get nconnect TCP connections to each server's first (encountered) IP address, and then one TCP connection to every other IP address (up to the max_connect limit). This is why I described 'nconnect' as controlling how many connections NFS v4 would make to a single server, instead of a single server IP (or name). It would be a bit more useful if you could set nconnect on a per-IP (or name) basis in NFS v4, or otherwise make it so that the first IP didn't get all of the connections.

(This is apparently called 'trunking' in NFS v4, per RFC 5661 section 2.10.5 (via).)

linux/NFSv4MaxConnectEffects written at 22:49:05; Add Comment

2024-02-05

We might want to regularly keep track of how important each server is

Today we had a significant machine room air conditioning failure in our main machine room, one that certainly couldn't be fixed on the spot ('glycol all over the roof' is not a phrase you really want to hear about your AC's chiller). To keep the machine room's temperature down, we had to power off as many machines as possible without too badly affecting the services we offer to people here, which are rather varied. Some choices were obvious; all of our SLURM nodes that were in the main machine room got turned off right away. But others weren't things we necessarily remembered right away or we weren't clear if they were safe to turn off and what effects it would have. In the end we took several rounds of turning servers off, looking at what was left, spotting remaining machines, and turning more things off, and we're probably not done yet.

(We have secondary machine room space and we're probably going to have to evacuate servers into it, too.)

One thing we could do to avoid this flailing in the future is to explicitly (try to) keep track of which machines are important and which ones aren't, to pre-plan which machines we could shut down if we had a limited amount of cooling or power. If we documented this, we could avoid having to wrack our brains at the last minute and worry about dependencies or uses that we'd forgotten. Of course documentation isn't free; there's an ongoing amount of work to write it and keep it up to date. But possibly we could do this work as part of deploying machines or changing their configurations.

(This would also help identify machines that we didn't need any more but hadn't gotten around to taking out of service, which we found a couple of in this iteration.)

Writing all of this just in case of further AC failures is probably not all that great a choice of where to spend our time. But writing down this sort of thing can often help to clarify how your environment is connected together in general, including things like what will probably break or have problems if a specific machine (or service) is out, and perhaps which people depend on what service. This can be valuable information in general. The machine room archaeology of 'what is this machine, why is it on, and who is using it' can be fun occasionally, but you probably don't want to do it regularly.

(Will we actually do this? I suspect not. When we deploy and start using a machine its purpose and so on feel obvious, because we have all of the context.)

sysadmin/TrackingMachineImportance written at 23:14:53; Add Comment

2024-02-04

I switched to explicit imports of things in our Django application

When I wrote our Django application it was a long time ago, I didn't know Django, and I was sort of in a hurry, so I used what I believe was the style at the time for Django of often doing broad imports of things from both Django modules and especially the application's other modules:

from django.conf.urls import *
from accounts.models import *

This wasn't universal; even at the time it was apparently partly the style to import only specific things from Django modules, and I followed that style in our code.

However, when I moved the application to Python 3 I also switched all of these over to specific imports. This wasn't required by Django (or by Python 3); instead, I did it because it made my editor complain less. Specifically it made Flycheck in GNU Emacs complain less (in my setup). I decided to do this change because I wanted to use Flycheck's list of issues to check for other, more serious issues, and because Flycheck specifically listed all of the missing or unknown imports. Because Flycheck listed them for me, I could readily write down everything it was reporting and see the errors vanish. When I had everything necessary imported, Flycheck was nicely quiet (about that).

Some of the import lines wound up being rather long (as you can imagine, the application's views.py uses a lot of things from our models.py). Even still, this is probably better for a future version of me who has to look at this code later. Some of what comes from the application models is obvious (like core object types), but not all of it; I was using some imported functions as well, and now the imports explicitly lists where they come from. And for Django modules, now I have a list of what I'm using from them (often not much), so if things change in a future Django version (such as the move from django.conf.urls to django.urls), I'll be better placed to track down the new locations and names.

In theory I could have made this change at any time. In practice, I only made it once I'd configured GNU Emacs for good Python editing and learned about Flycheck's ability to show me the full error list. Before then all of the pieces were two spread apart and too awkward for me to reach for.

(Of course, this isn't the first time that my available tools have influenced how I programmed in a way that I noticed.)

python/DjangoExplicitImportsSwitch written at 21:50:01; Add Comment

2024-02-03

Solving one of our Django problems in a sideways, brute force way

A few years ago I wrote about an issue with propagating some errors in our Django application. We have two sources of truth for user authorization, one outside of Django (in Unix group membership that was used by Apache HTTP Basic Authentication), and one inside Django in a 'users' table; these two can become desynchronized, with someone in the Unix group but not in the application's users table. The application's 'retrieve a user record' function either returns the user record or raises an Http404 exception that Django automatically handles, which means that someone who hasn't been added to the user table will get 404 results for every URL, which isn't very friendly. I wanted to handle this by finding a good way to render a different error page in this case, either by customizing what the 'Http404' error page contained or by raising a different error.

All of this is solving the problem in the obvious way and also a cool thing to (try to) do in Django. Who doesn't want to write Python code that handles exceptional cases by, well, raising exceptions and then having them magically caught and turn into different rendered pages? But Django doesn't particularly support this, although I might have been able to add something by writing an application specific piece of Django middleware that worked by catching our custom 'no such user' exception and rendering an appropriate template as the response. However, this would have been my first piece of middleware, so I held off trying anything here until we updated to a modern version of Django (partly in the hopes it might have a solution).

Then, recently a simpler but rather less cool option to deal with this whole issue occurred to me. We have a Django management command that checks our database for consistency in various ways (for example, unused records of certain types, or people in the application's users table who no longer exist), which we run every night (from cron). Although it was a bit of a violation of 'separation of concerns', I could have that command know about the Unix group(s) that let people through Apache, and then have it check that all of the group members were in the Django user table. If people were omitted, we'd get a report. This is pretty brute force and there's nothing that guarantees that the command's list of groups stays in synchronization with our Apache configuration, but it works.

It's also a better experience for people than the cool way I was previously considering, because it lets us proactively fix the problem before people encounter it, instead of only reactively fixing it after someone runs into this and reports the issue to us. Generally, we'll add someone to the Unix group, forget to add them to Django, and then get email about it the next day before they'll ever try to use the application, letting us transparently fix our own mistake.

(This feels related to something I realized very early about not trying to do everything through Django's admin interface.)

python/DjangoSolvingProblemSideways written at 21:44:04; Add Comment

2024-02-02

One of my MH-E customizations: 'narrow-to-pending' (refiles and deletes)

I recently switched from reading my email with exmh, a graphical X frontend to (N)MH, to reading it with MH-E in GNU Emacs, which is also a frontend to (N)MH. I had a certain amount of customizations to exmh, and for reasons beyond the scope of this entry, I wound up with more for MH-E. One of those customizations is a new MH-E command (and keybinding for it), mh-narrow-to-pending.

Both exmh and MH-E process deleting messages and refiling them to folders in two phases. In the first phase your read your email and otherwise go over the current folder, marking messages to be deleted and refiled; once you're satisfied, you tell them to actually execute these pending actions. MH-E also a general feature to limit what messages are listed in the current folder. In Emacs jargon this general idea is known as narrowing, and there's various tools to 'narrow' the display of buffers to something of current interest. My customization narrows to show only the messages in the current folder that have pending actions on them; these are the messages that will be affected if you execute your pending actions.

So here's the code:

(defun mh-narrow-to-pending ()
  "Narrow to any message with a pending refile or delete."
  (interactive)
  (if (not (or mh-delete-list mh-refile-list))
      (message "There are no pending deletes or refiles.")
    (when (assoc 'mh-e-pending mh-seq-list) (mh-delete-seq 'mh-e-pending))
    (when mh-delete-list (mh-add-msgs-to-seq mh-delete-list 'mh-e-pending t t))
    (when mh-refile-list
      (mh-add-msgs-to-seq
       (cl-loop for folder-msg-list in mh-refile-list
                append (cdr folder-msg-list))
       'mh-e-pending t t))
    (mh-narrow-to-seq 'mh-e-pending)))

(This code could probably be improved, and reading it I've discovered that I've already forgotten what parts of it do and the details of how it works, although the broad strokes are obvious.)

Writing this code required reading the existing MH-E code to find out how it did narrowing and how it marked messages that were going to be refiled or deleted. In the usual GNU Emacs way, this is not a documented extension API for MH-E, although in practice it's unlikely to change and break my code. To the best of my limited understanding of making your own tweaks for GNU Emacs modes like MH-E, this is basically standard practice; generally you grub around in the mode's ELisp source, figure things out, and then do things on top of it.

There are two reasons that I never tried to write something like this for exmh. The first is that exmh doesn't do anywhere near as much with the idea of 'narrowing' the current folder display. The other is that I wound up using the two differently. In MH-E, it's become quite common for me to pick through my inbox (or sometimes other folders) for messages that I'm now done with, going far enough back in one pass that I wind up with a sufficient patchwork that I want to double check what exactly I'm going to be doing before I commit my changes. Since I can easily narrow to messages in general, narrowing to see these pending changes was a natural idea.

(Picking through the past week or more of email threads in my inbox has become a regular Friday activity for me, especially given that MH-E has a nice threaded view.)

It's easy to fall into the idea that any readily extendable program is kind of the same, because with some work you can write plugins, extensions, or other hacks that make it dance to whatever tune you want. What my experience with extending MH-E has rubbed my nose into is that the surrounding context matters in practice, both in how the system already works and in what features it offers that are readily extended. 'Narrow to pending' is very much an MH-E hack.

programming/MHENarrowToPending written at 22:57:05; Add Comment

2024-02-01

Our Django application is now using Python 3 and a modern Django

We have a long standing Django web application to handle the process of people requesting Unix accounts here and having the official sponsor of their account approve it. For a long time, this web app was stuck on Python 2 and Django 1.10 after a failed attempt to upgrade to Django 1.11 in 2019. Our reliance on Python 2 was obviously a problem, and with the not so far off end of life of Ubuntu 20.04 it was getting more acute (we use Apache's mod_wsgi, and Ubuntu 22.04 and later don't have a Python 2 version of that for obvious reasons). Recently I decided I had to slog through the process of moving to Python 3 and a modern Django (one that is actually supported) and it was better to start early. To my pleasant surprise the process of bringing it up under Python 3 and Django 4.2 was much less work than I expected, and recently we migrated the production version. At this point it's been running long enough (and has done enough) that I'm calling this upgrade a success.

There are a number of reasons for this smooth and rapid sailing. For a start, it turns out that my 2019 work to bring the app up under Python 3 covered most of the work necessary, although not all of it. Our previous problems with CSRF and Apache HTTP Basic Authentication have either been sidestepped by Django changes since 1.11 or perhaps mitigated by Django configuration changes based on a greater understanding of this area that I worked out two years ago. And despite some grumpy things I've said about Django in the past, our application needed very few changes to go from Django 1.10 to Django 4.2.

(Most of the Django changes seem to have been moving from 'load staticfiles' to 'load static' in templates, and replacing use of django.conf.urls.url() with django.urls.re_path(), although we could probably do our URL mapping better if we wanted to. There are other minor changes, like importing functions from different places, changing request.POST.has_key(X) to X in request.POST, and defining DEFAULT_AUTO_FIELD in our settings.)

Having this migration done and working takes a real load off of my mind for the obvious reasons; neither Python 2 nor Django 1.10 are what we should really be using today, even if they work, and now we're free to upgrade the server hosting this web application beyond Ubuntu 20.04. I'm also glad that it took relatively little work now.

(Probably this will make me more willing to keep up to date with Django versions in the future. We're not on Django 5.0 because it requires a more recent version of Python 3 than Ubuntu 20.04 has, but that will probably change this summer or fall as we start upgrades to Ubuntu 24.04.)

python/DjangoAppNowPython3 written at 23:06:25; Add Comment

2024-01-31

Using IPv6 has quietly become reliable (for me)

I've had IPv6 at home for a long time, first in tunneled form and later in native form, and recently I brought up more or less native IPv6 for my work desktop. When I first started using IPv6 (at home) and for many years afterward, there were all sorts of complications and failures that could be attributed to IPv6 or that went away when I turned off IPv6. To be honest, when I enabled IPv6 on my work desktop I expected to run into a fun variety of problems due to this, since before then it had been IPv4 only.

To my surprise, my work desktop has experienced no problems since enabling IPv6 connectivity. I know I'm using some websites over IPv6 and I can see IPv6 traffic happening, but at the personal level, I haven't noticed anything different. When I realized that, I thought back over my experiences at home and realized that it's been quite a while since I had a problem that I could attribute to IPv6. Quietly, while I wasn't particularly noticing, the general Internet IPv6 environment seems to have reached a state where it just works, at least for me.

Since IPv6 is everyone's future, this is good news. We've been collectively doing this for long enough and IPv6 usage has climbed enough that it should be as reliable as IPv4, and hopefully people don't make common oversights any more. Otherwise, we would collectively have a real problem, because turning on IPv6 for more and more people would be degrading the Internet experience of more and more people. Fortunately that's (probably) not happening any more.

I'm sure that there are still IPv6 specific issues and problems that come up, and there will be more for a long time to come (until perhaps they're overtaken by year 2038 problems). But t you can have problems that are specific to anything, including IPv4 (and people may already be having those).

(As more people add IPv6 to servers that are currently IPv4 only, we may also see a temporary increase in IPv6 specific problems as people go through 'learning experiences' of operating IPv6 environments. I suspect that my group will have some of those when we eventually start adding IPv6 to various parts of our environment.)

tech/IPv6NowReliableForMe written at 22:26:21; Add Comment

2024-01-30

Putting a Python executable in venvs is probably a necessary thing

When I wrote about getting the Python LSP server working with venvs in a brute force way, Ian Z aka nobrowser commented (and I'm going to quote rather than paraphrase):

I'd say that venvs themselves are "aesthetically displeasing". After all, having a separate Python executable for every project differs from having a separate LSP in degree only.

On Unix, this separate executable is normally only a symbolic link, although other platforms may differ and the venv normally will have its own copy of pip, setuptools, and some other things, which can amount to 20+ Mbytes even on Linux. However, when I thought about it, I don't think there's any good option other than for the venv to have its own (nominal) copy of Python. The core problem is that venvs are very convenient when they're more or less transparently activated.

A Python venv is marked by a special file in the root of the venv, pyvenv.cfg. There are two ways that Python could plausibly decide when to automatically activate a venv without you having to set any environment variables; it can look around the environment of the Python executable you ran for this marker (which is what it does today), or it could look around the environment of your current directory, traversing up the filesystem to see if it could find a pyvenv.cfg (in much the same way that version control systems look for their special .git or .hg directory to mark the repository root).

The problem with automatically activating a venv based on what you find in the current directory and its parents is that it makes Python programs (and the Python interpreter) behave differently depending on where you are when you run them, including random system utilities that just happen to be written in Python. If the program requires any packages beyond the standard library, it may well fail outright because those packages aren't installed in the venv, and if they are installed in the venv they may not be the version the program needs or expects. This isn't a particularly good experience and I'm pretty confident that people would be very unhappy if this was what Python did with venvs.

The other option is to not automatically activate venvs at all and always require you to set environment variables (or the local equivalent). The problem for this is that it's a terrible experience for actually using venvs to, for example, deploy programs as encapsulated entities. You can't just ship the venv and have people run programs that have been installed into its bin/ subdirectory; now they need cover scripts to set the venv environment variables (which might be automatically generated by pip or whatever, but still).

So on the whole embedding the Python interpreter seems the best choice to me. That creates a clear logic to which venv is automatically activated, if any, that can be predicted by people; it's the venv whose Python you're running. Of course I wish it didn't take all of that disk space for extra copies of pip and setuptools, but you can't have everything.

python/VenvsAndEmbeddedPython written at 21:28:13; Add Comment

2024-01-29

What I think goes wrong periodically with our Grafana Loki on restarts

We have now had two instances where restarting Grafana Loki caused it to stop working. Specifically, shortly after restart, Loki began logging a flood of mysterious error messages of the form:

level=warn ts=2024-01-29T19:01:30.[…]Z caller=logging.go:123 [...] msg="POST /loki/api/v1/push (500) 148.309µs Response: \"empty ring\\n\" [...] User-Agent: promtail/2.9.4; [...]"

This is obviously coming from promtail trying to push logs into Loki, but I got 'empty ring' errors from trying to query the logs too. With a flood of error messages and these messages not stopping, both times I resorted to stopping Loki and deleting and restarting its log database (which we've also had to do for other reasons).

As far as I can tell from Internet searches, what Loki's 'empty ring' error message actually means here is that some component of Loki has not (yet) started properly. Although we operate it in an all in one configuration that I can't recommend, Loki is 'supposed' to be operated as a cooperative fleet of a whole horde of individual microservices, or at least as three separate services ("three-target" mode). To operate in these modes with possibly multiple instances of each (micro)service, Loki uses hash rings to locate which instance of a particular component should be used. When Loki reports an 'empty ring' error, what it means is that there's nothing registered in the hash ring it attempted to use to find an instance. Which hash ring? Loki doesn't tell you; you're presumably expected to deduce it from context. Although we're operating Loki in an all-in-one configuration, Loki apparently still internally has hash rings (most of them probably with exactly one thing registered) and those hash rings can be empty if there are issues.

(As best I can tell from Loki metrics, our current configuration has hash rings for the ingester, the (index) compactor, and the (query?) scheduler.)

Since this error comes from promtail log pushing, the most likely component to have not registered itself is the ingester, which receives and processes incoming log lines, eventually writing them to your chunk storage, which in our case is the filesystem. The ingester doesn't immediately write each new log line to storage; instead it aggregates them into those chunks in memory and then writes chunks out periodically (when they are big enough or old enough). To avoid losing chunks that aren't yet full if Loki is stopped for some reason, the ingester uses a write ahead log (WAL). As the Loki documentation says, when the ingester restarts (which in our case means when Loki restarts), it must replay the WAL into memory before 'registering itself as ready for subsequent writes' to quote directly from the documentation. I have to assume that what Loki really does is that the ingester replays the WAL before adding itself to the ingester hash ring. So while WAL replay is happening there is probably no ingesters registered in your ingester hash ring and attempts to push logs will fail (well, be rejected) with 'empty ring'.

Due to how Loki actually creates and stores chunks and doesn't compact them we try to have as few chunks as possible, which means that we have a very long chunk lifetime and maximum chunk size. This naturally leads to having a lot of chunks and log data sitting in the memory of our Loki process, and (probably) a big WAL, although how big will depend partly on timing. The ingester's WAL can be configured to be flushed on regular shutdown (the flush_on_shutdown wal option) but we have historically turned this off so that Loki restarts don't flush out a bunch of small chunks (plus flushing a big WAL will take time). So after our Loki has been running for long enough, when it shuts down it will have a large WAL to replay on startup.

So what I believe happened is that our configuration wound up with a very big ingester WAL, and when Loki started, the ingester just sat there replaying the WAL (which is actually visible in the Loki metrics like loki_ingester_wal_replay_active and loki_ingester_wal_recovered_bytes_total). Since the ingester was not 'ready', it did not register in the ingester hash ring, and log pushing was rejected with 'empty ring'. Probably if I had left Loki alone long enough (while it spewed messages into the log), it would have finished WAL replaying and all would have been fine. There's some indication in historical logs that this has actually happened in the past when we did things like reboot the Loki host machine for kernel updates, although to a lesser extent than this time. Deleting and restarting the database fixes the problem for the obvious reason that with no database there's no WAL.

(This didn't happen on my Loki test machine because my test machine has far fewer things logging to it, only a couple of other test machines. And this also explains how the first time around, reverting to our previous Loki version didn't help. We'd have seen the same problem if we'd restarted Loki without an upgrade, which is accidentally what happened this time.)

Probably the most important fix to this is to enable flushing the WAL to chunk storage on shutdown (along with vastly lengthening systemd's shutdown timeout for Loki, since this flushing may take a while). In practice we restart Loki very infrequently, so this won't add too many chunks (although it will make me more reluctant to restart Loki), and when it works it will avoid having to replay the WAL on startup. A related change is to raise the ingester wal parameter replay_memory_ceiling, because otherwise we'll wind up flushing a bunch of chunks on startup if we start with a big WAL (for example, if the machine lost power or crashed). And the broad fix is to not take 'empty ring' failures seriously unless they last for quite a long time. How long is a long time? I don't know, but probably at least ten minutes after startup.

(I believe that promtail will keep retrying after receiving this reply from Loki, and we have relatively long retry times configured for promtail before it starts discarding logs. So if this issue clears after ten or twenty minutes, the only large scale harm is a massive log spam.)

PS: Based on past experience, I won't know if I'm right for a fairly long time, probably at least close to a year.

sysadmin/GrafanaLokiStartupWALReplayIssue written at 21:13:47; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.