Wandering Thoughts archives

2022-01-27

The Linux kernel, simultaneous multithreading, and process scheduling

Back in an earlier entry on simultaneous multithreading, I said that I expected operating systems, Linux included, to generally schedule processes on to a single CPU of each core before it started doubling up processes on two CPUs of a single core. This raises the question of whether the Linux process scheduler is SMT-aware and what it seems to do in practice.

The answer to the first question is that it clearly is SMT aware, although I haven't traced through the code to see exactly what effects it has. There is a SCHED_SMT kernel configuration option that affects various things, and there are various comments about this (and code) in kernel/sched/fair.c and kernel/sched/topology.c, among other spots. The comments I've skimmed through suggest that the kernel does all of the obvious things with SMT pairs, like considering the caches 'hot' for a process on both CPUs if it was running recently on one of them.

For how things go in practice, a fairly current Linux kernel (the Fedora 34 version of 5.15.15) running one CPU consuming process per core (not CPU) seems to mostly distribute CPU load the way I'd expect. Using tools like htop and mpstat doesn't let me see the CPU (or core) scheduling history of individual processes, but the kernel doesn't do anything obviously weird looking. CPU utilization does hop from CPU to CPU from time to time, which I suspect is an artifact of other running processes preempting the CPU hogs off their original CPU and on to the other CPU for that core. A CPU load that makes a bunch of system calls (basically an infinite loop in the shell) looks more erratic in htop; the CPU load appears to bounce around a lot and there's a bunch of system time involved too.

My conclusion from looking is that the Linux kernel in a normally operating system doesn't do anything glaringly obvious about what CPU gets used across cores. For instance, it's not like the first CPU of each core gets used almost all of the time and the load only spills over to the second CPU during high load periods. Even with very low load, every CPU can get used from time to time (you can see this in both htop and mpstat).

(In short, it's all boring, with no surprises or interesting things that I could see.)

linux/KernelSMTScheduling written at 23:23:57; Add Comment

Django and Apache HTTP Basic Authentication (and REMOTE_USER)

We have a Django application, and part of it exists behind Apache HTTP Basic Authentication. For reasons beyond the scope of this entry, I was recently rediscovering some things about how Django interacts with Apache HTTP Basic Authentication, and so I want to write them down for myself before I forget them again.

First, the starting point in the Django documentation for this is not to search for 'HTTP Basic Authentication' or anything like that, but for the howto on authenticating with REMOTE_USER, which is the environment variable that Apache injects when it's already authenticated something. I believe that if you search for 'Django' with 'Basic Authentication' on search engines, you tend to get information about making Django or Django-related things actually perform the server side of HTTP Basic authentication itself. This is fair enough but can be confusing.

Second, you only need to configure Django itself to authenticate with REMOTE_USER if you want to use Django's own authentication for something, such as access and authorization in its admin site. It's perfectly valid (although potentially annoying) to authenticate and limit access to your Django site (or parts of it) in your Apache configuration with Apache's HTTP Basic Authentication but have a separate Django login step to access the Django admin site or even parts of your application (which will then be tracked with cookies and so on). If you want to do this, you don't want to add Django's RemoteUserMiddleware and so on into your Django settings.

(You'll have to manage Apache users and Django users separately, passwords included, and they won't be the same thing. This might wind up being confusing.)

If you do have Django authenticating with REMOTE_USER, you need your Django database superuser to be something you can authenticate with through Apache. If you cleverly set your database superuser to 'admin' but you have no 'admin' in your Basic Auth database, you will be sad. It's possible to get yourself out of this in a couple of ways, but it's better to avoid it in the first place.

(When you do have Django authenticating this way, ever person who uses your Django app through HTTP Basic Authentication will wind up with an entry in the Django 'User' table. Purging old logins that no longer exist is up to you, if you care. For people who you want to be able to use the Django admin site, you need to set them as at least 'Staff' in the Django User table. You can set them as database superusers too.)

It's not necessary to use Django's REMOTE_USER support in order to make use of the authentication information yourself, as long as Apache has HTTP Basic Authentication active. You can retrieve the login name from the $REMOTE_USER environment variable and look it up in your own 'User' table by hand, as we do. You may or may not want to automatically create new entries for new users, the way Django does by default. We don't because new people require some additional configuration on our side.

The corollary to this is that you can use and test your entire site under Apache HTTP Basic Authentication without having Django properly wired up to use REMOTE_USER, without noticing. I believe that this potentially actually matters, because I believe that Django does some things with sessions differently when you have the RemoteUser* things enabled, and this interacts with Django's CSRF protections. Which we've had mysterious problems with (also).

python/DjangoApacheBasicAuth written at 00:40:55; Add Comment


Page tools: See As Normal.
Search:
Login: Password:
Atom Syndication: Recent Pages, Recent Comments.

This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.