Understanding EGL, GLX and friends in the Linux and X graphics stack
Earlier this month I wrote a blog entry about Firefox having jank
with WebRender turned on, in
which I mentioned that the problem had appeared when I stopped using
an environment variable
$MOZ_X11_EGL that forced Firefox to use
"EGL". The blog post led to someone filing a quite productive
where I learned in passing that Firefox is switching to EGL by
in the future. This made me realize that I didn't actually know
what EGL was, where it fit into the Linux and X graphics stack, and
if it was old or new. So here is a somewhat simplified explanation
of what I learned.
In the beginning of our story is OpenGL, which in the 1990s became the dominant API (and pretty much the only one) for 3D graphics on Unix systems, as well as spreading to other platforms. However, OpenGL is more or less just about drawing things on a "framebuffer". Generally people on Unix and X don't want to just draw 3D things over top of the entire screen; they want to draw 3D things in an X window (or several) and then have those mix seamlessly with other 3D things being done by other programs in other windows. So you need to somehow connect the OpenGL world and the X world so that you can have OpenGL draw in a way that will be properly displayed in a specific X window and so on.
(This involves the action of many parties, especially once hardware acceleration gets involved and you have partially obscured windows with OpenGL rendering happening in them.)
The first version of an interconnection layer was GLX. As you can see from its
features, GLX is a
very X way of approaching the problem, since its default is to send
all your OpenGL operations to the X server so that the X server can
do the actual OpenGL things. The result inherits the X protocol's
advantages of (theoretical) network transparency, at the cost of
various issues. The 'glx' in programs like '
glxinfo' (used to find out whether
your X environment has decent OpenGL capabilities) and '
(everyone's favorite basic OpenGL on X test program) comes from,
well, GLX. As suggested by the 'X' in its name, GLX is specific
to the X Window System.
Eventually, various issues led to a second version of an interconnection layer. This time around the design was intended to be cross platform (instead of being tied to X) and it was done through the Khronos Group, the OpenGL standards organization. The result is EGL, and you can read (some of?) its manpages here. EGL will let you use more than just OpenGL, such as the simpler subset OpenGL ES, and I believe its API is platform and window system independent (although any particular implementation is likely to be specific to some window system). EGL apparently fixes various inefficiencies and design mistakes in GLX and so offers better performance, at least in theory. Also, pretty much everyone working on the Unix graphics stack likes EGL much more than GLX.
On Unix, EGL is implemented in Mesa, works with X, and
has been present for a long time (back to 2012); current documentation
is here. Wayland requires and
uses EGL, which is unsurprising since GLX is specific to X (eg). I
suspect that EGL on X is not in any way network transparent, but I
don't know and haven't tested much (I did try some EGL programs
from the Mesa demos and
they mostly failed, although
eglinfo printed stuff).
On X, programs can use either the older GLX or the newer EGL in order to use OpenGL; if they want to use OpenGL ES, I believe they have to use EGL. Which one of GLX and EGL works better, has fewer bugs, and performs better has varied over time and may depend on your hardware. Generally the view of people working on Unix graphics is that everyone should move to EGL (cf), but in practice, well, Firefox has had a bug about it for nine years now and in my searches I've seen people say that EGL used to perform much worse than GLX in some environments (eg, from 2018).
While I'm here, Vulkan is the next generation replacement for OpenGL and OpenGL ES, at least for things that want high performance, developed by the Khronos Group. As you'd expect for something developed by the same people who created EGL, it was designed with an interconnection layer, Windows System Integration (WSI) (also [pdf]). I believe that a Vulkan WSI is already available for X, as well as for Wayland. Vulkan (and its WSI) is potentially relevant for the future partly because of Zink, a Mesa project to implement OpenGL on top of Vulkan. If people like Intel, AMD, and maybe someday NVIDIA increasingly provide Vulkan support (open source or otherwise) that's better than their OpenGL support, Zink and Vulkan may become an important thing in the overall stack. I don't know how an application using EGL and OpenGL on top of a Zink backed would interact with a Vulkan WSI, but I assume that Zink plumbs it all through somehow.
On Ubuntu, programs like
eglinfo are available in the mesa-utils-extra
package. On Fedora, the egl-utils package gives you
es2_info, but for everything else you'll need the mesa-demos
PS: For one high level view of the difference between OpenGL and OpenGL ES, see here.
Link: Examining btrfs, Linux’s perpetually half-finished filesystem
Ars Technica's Examining btrfs, Linux’s perpetually half-finished filesystem (via) is not very positive, as you might expect from the title. I found it a useful current summary of the practical state of btrfs, which is by all accounts still not really ready for use even in its redundancy modes that are considered "ready for production". There's probably nothing new for people who are actively keeping track of btrfs, but now I have something to point to if people ask why we're not and won't be.
Go generics have a new "type sets" way of doing type constraints
Any form of generics needs some way to constrain what types can be used with your generic functions (or generic types with methods), so that you can do useful things with them. The Go team's initial version of their generics proposal famously had a complicated method for this called "contracts", which looked like function bodies with some expressions in them. I (and other people) thought that this was rather too clever. After a lot of feedback, the Go team's revised second and third proposal took a more boring approach; the final design that was proposed and accepted used a version of Go interfaces for this purpose.
Using standard Go interfaces for type constraints has one limitation;
because they only define methods, a standard interface can't express
important constraints like 'the type must allow me to use
its values' (or, in general, any operator). In order to deal with
this, the "type parameters" proposal that was accepted allowed an
addition to standard interfaces. Quoting from the issue's summary:
- Interface types used as type constraints can have a list of predeclared types; only type arguments that match one of those types satisfy the constraint.
- Generic functions may only use operations permitted by their type constraints.
Recently this changed to a new, more general, and more complicated approach that goes by the name of "type sets" (see also, and also). The proposal contains a summary of the new state of affairs, which I will quote (from the overview):
- Interface types used as type constraints can embed additional elements to restrict the set of type arguments that satisfy the constraint:
- an arbitrary type
Trestricts to that type
- an approximation element
~Trestricts to all types whose underlying type is
- a union element
T1 | T2 | ...restricts to any of the listed elements
- Generic functions may only use operations supported by all the types permitted by the constraint.
Unlike before, these embedded types don't have to be predeclared ones and may be composite types such as maps or structs, although somewhat complicated rules apply.
Type sets are more general and less hard coded than the initial
version, so I can see why the generics design has switched over to
them. But they're also more complicated (and more verbose), and I
worry that they contain a little trap that's ready to bite people
in the foot. The problem is that I think you'll almost always want
to use an approximation element,
~T, but the arbitrary type element
T is the default. If you just list off some types, your generics
are limited to exactly those types; you have to remember to add the
~' and then use the underlying type.
My personal view is that using type declarations for predeclared types
is a great Go feature, because it leads to greater type safety. I may
be using an
int for something, but if it's a lexer token or the state
of a SMTP connection or the like, I want to make it its own type to save
me from mistakes, even if I never define any methods for it. However,
if using my own types starts making it harder to use people's generics
implementations (because they've forgotten that '
~'), I'm being pushed
away from it.
Some of the mistakes of leaving out the '
~' will be found early, and I
think adding it wouldn't create API problems for existing users, so this
may not be a big issue in practice. But I wish that the defaults were
the other way around, so that you had to go out of your way to restrict
generics to specifically those types with no derived types allowed.
(If you just list some types without using a union element you've most likely just created an unsatisfiable generic type with an empty type set. However you're likely to notice this right away, since presumably you're going to try to use your generics, if only in tests.)
It's probably not the hardware, a sysadmin lesson
We just deployed a new OpenBSD 6.9 machine the other day, and after it was deployed we discovered that it seemed to have serious problems with keeping time properly. The OpenBSD NTP daemon would periodically declare that the clock was unsynchronized, when it was adjusting the clock it was frequently adjusting it by what seemed to be very large amounts (by NTP standards), reporting numbers like '-0.244090s', and most seriously every so often the time would wind up completely off by tens of minutes or more. Nothing like this has happened on any of our other OpenBSD machines, especially the drastic clock jumps.
Once we noticed this, we flailed around looking at various things and wound up reforming the machine's NTP setup to be more standard (it was different for historical reasons). But nothing cured the problem, and last night its clock wound up seriously off again. After all of this we started suspecting that there was something wrong with the machine's hardware, or perhaps with its BIOS settings (I theorized wildly that the BIOS was setting it to go into a low power mode that OpenBSD's timekeeping didn't cope with).
Well, here's a spoiler: it wasn't the hardware, or at least the drastic time jumps aren't the hardware. Although we'll only know for sure in a few days, we're pretty sure we've identified their cause, and it's due to some of our management scripts (that are doing things well outside the scope of this entry).
When we have a mysterious problem and we just can't understand it despite all our attempts to investigate things, it's tempting to decide that it's a hardware problem. And sometimes it actually is. But a lot of the time it's actually software, just as a lot of the time what you think has to be a compiler bug is a bug in your code.
(If it's a hardware problem it's not something you can fix, so you can stop spending your time digging and digging into software while getting nowhere and frustrating yourself. This is also the appeal of it being a compiler bug, instead of your bug; if it's your bug, you need to keep on with that frustrating digging to find it.)
Why we care about being able to (efficiently) reproduce machines
One of the broad meta-goals of system administration over the past few decades has been working to be able to reliably reproduce machines, ideally efficiently. It wasn't always this way (why is outside the scope of this entry), but conditions have changed enough so that this became a concern for increasingly many people. As a result, it's a somewhat quiet driver of any number of things in modern (Unix) system administration.
There are a number of reasons why you would want to reproduce a machine. The current machine could have failed (whether it's hardware or virtual) so you need to reconstruct it. You might want to duplicate the machine to support more load or add more redundancy, or to do testing or experimentation. Sometimes you might be reproducing a variant of the machine, such as to upgrade the version of the base Linux or other Unix it uses. The more you can reproduce your machines, the more flexibility you can have with all of these, as well as the more confidence you can have that you understand your machine and what went into it.
One way of reproducing a machine is to take careful notes of everything you ever do to the machine, from the initial installation onward. Then, when you want to reproduce the machine, you just repeat everything you ever did to it. However, this suffers from the same problem as replaying a database's log on startup in order to restore its state; replaying all changes isn't very efficient, and it gets less efficient as time goes on and more changes build up.
(You may also find that some of the resources you used are no longer available or have changed their location or the like.)
The goal of being able to efficiently reproduce machines has led system administration to a number of technologies. One obvious broad area is systems where you express the machine's desired end state and the system makes whatever changes are necessary to get there. If you need to reproduce the machine, you can immediately jump from the initial state to your current final one without having to go through every intermediate step.
(The virtual machine approach where VMs are immutable once created can be seen as another take on this. By forbidding post-creation changes, you fix and limit how much work you may need to "replay".)
There are two important and interrelated ways of making reproducing a machine easier (and often more efficient). The first is to decide to allow some degree of difference between the original version and the reproduction; you might decide that you don't need exactly the same versions of every package or to have every old kernel. The second is to systematically work out what you care about on the machine and then only exactly reproduce that, allowing other aspects of the machine to vary within some acceptable range.
(In practice you'll eventually need to do the second because you're almost certain to need to change the machine in some ways, such as to apply security updates to packages that are relevant to you. And when you "reproduce" the machine using a new version of the base Unix, you very much need to know what you really care about on the machine.)
What the '
proto' field is about in Linux '
ip route' output (and input)
Today I was looking at the output of '
ip route list' on one of
our Ubuntu 18.04 machines and noticed that a few of them looked
different from the others:
10.99.0.0/16 via y.y.y.y dev eno1
172.16.0.0/16 via q.q.q.q dev eno1
220.127.116.11/26 via z.z.z.z dev eno2 proto static
The plain routes were routes to our internal "sandbox" networks and are set up by our local system to maintain them, while the route with "proto static" is an additional route added in this machine's netplan configuration. For reasons related to our solution to our problem with netplan and routes on 18.04, it would be very useful to be able to tell routes added by netplan apart from routes added by us, so I wondered if the "proto static" bit was a reliable indicator and what it actually meant.
The short summary is that this '
proto' is a theoretical route
origin or routing protocol for whatever added the route, and can
(in theory) be used to distinguish between different sources of
routes. The routes without a "proto static" also have a routing
origin associated with them, but it's the default one so
print it by default (you can see it with '
ip -d route list'; it's
proto boot'). So now some details.
The ip-route(8) manpage calls this field a RTPROTO and describes it this way:
[T]he routing protocol identifier of this route. RTPROTO may be a number or a string from the file
/etc/iproute2/rt_protos. If the routing protocol ID is not given,
boot(i.e. it assumes the route was added by someone who doesn't understand what they are doing). Several protocol values have a fixed interpretation. [...]
(The manpage doesn't explain why you should set the '
proto' to if
you're adding routes through '
ip route', either manually or with
scripts, or what value you should use.)
Another description of this is in the rtnetlink(7) manpage,
where it's called the "route origin". The manpage mentions that
these are (mostly) for user information and could be used to tag
the source of routing information or distinguish between multiple
routing daemons. This at least gives you some clues about why you
would want to set a '
proto' and what you might use it for.
In both ip-route and rtnetlink, the '
boot' route origin is described
as for routes added during boot. This is a bit misleading, because
boot' is the default route origin (as ip-route's manpage suggests).
If you use '
route add' to add routes the old fashioned way, or
ip route add' with no '
proto ...' option, you get a
"proto boot" route. Since our route setup system did not specify a
route origin, all of our routes got this default (and then '
route list' omitted it by default).
static' route origin is described by ip-route as "the route
was installed by the administrator to override dynamic routing".
This seems right for a route added through static configuration
such as with netplan, but it means that other things might also add
routes using this route origin. For example, NetworkManager on my
Fedora laptop used '
proto static' for an assortment of routes
that were established when I started up OpenVPN and L2TP VPN
connections, but not for WireGuard ones.
(NetworkManager also used '
proto dhcp' for the default route
obtained through DHCP.)
In practice it's not Netplan making the decision about the route
origin for the routes it sets up, because on our Ubuntu 18.04 setup
it's really based on systemd's networkd. Netplan writes out a
networkd configuration and then has networkd implement it, so it's
networkd that's actually setting the '
static' route origin here.
Unsurprisingly, my networkd setup on my Fedora office workstation
also winds up with the default route configured as a '
Given all of this, our system to set up our "sandbox" routes should
likely be setting up all of its routes with a unique route origin
number (as '
ip route add ... proto NNN'). This would allow us
to reliably identify (and manipulate) routes created by the system,
while leaving all other routes alone, whether they were set up
through a system's Netplan configuration or added by hand on the
fly with '
ip route add ...'.
(If I'm reading the ip-route manpage correctly, this would also
allow us to remove all of these special routes at once with '
route flush proto NNN'.)
As far as I can tell, the Linux kernel itself makes very little use of whatever value you set for the routing origin. The rtnetlink manpage explicitly says that user created values (more or less ones other than 'kernel', 'boot', and 'static') aren't used by the kernel at all. The kernel does seem to do some checks in bridging to see if the route is or isn't a 'kernel' one, but I haven't read the code to understand what's going on.
(The normal source of '
proto kernel' routes is the direct routes
you get when you configure IP addresses and networks on interfaces.)
PS: Different systems appear to have different contents for
/etc/iproute2/rt_protos. If you want to add your own entries,
you probably want to use drop in .conf files in
/etc/iproute2/rt_protos.d/, since the iproute2 upstream
changes the main file periodically.
Microsoft's Bingbot crawler is relentless for changing pages (it seems)
I look at the web logs for Wandering Thoughts every so often. There are many variations from day to day but regardless of what other things change, one thing is as predictable as the sun rising in the morning; every day some MSN Search IP address will be the top single source of traffic, as Bingbot crawls through here. This isn't at all new, as I wrote about Bingbot being out of control back in 2018, but it's somewhere between impressive and depressing just how long this has gone on.
(There are days when Bingbot isn't the top source of traffic, but those are days when someone has turned an abusive crawler lose.)
As it turns out, there is an interesting pattern to what Bingbot is doing. While it's pretty relentless and active in general, one specific URL stands out. Yesterday Bingbot requested the front page of Wandering Thoughts a whopping 1,400 times (today isn't over but it's up to 1,300 times so far). This is a running theme; my blog's front page is by far Bingbot's most requested page regardless of the day.
(Bingbot is also obsessed with things that it can't crawl; today, for example, it made 92 requests for a page that it's barred from with a HTTP 403 response.)
The front page of Wandering Thoughts changes at least once a day (more or less) when a new entry is published, and more often if people leave comments on recent entries (as this updates the count of comments for the entry). However, it doesn't update a hundred times a day even when people are very active with their comments, and Bingbot is being ten times more aggressive than that. I was going to say that Bingbot has other options to discover updates to Wandering Thoughts, such as my Atom syndication feeds, but it turns out that I long ago barred it from fetching a category of URLs here that includes those feeds.
(I have ambivalent feelings about web crawlers fetching syndication feeds. At a minimum, they had better do it well and not excessively, which based on present evidence I suspect Bingbot would not manage.)
Now that I've discovered this Bingbot pattern, I'm tempted to bar it from fetching the front page. The easiest thing to do would be to bar Bingbot entirely, but Bing is a significant enough search engine that I'd feel bad about that (although they don't seem to send me very much search traffic). Of course that might just transfer Bingbot's attention to another of the more or less equivalent pages here that it's currently neglecting, so perhaps I should just leave things as they are even if Bingbot's behavior irritates me.
PS: Of course there could be something else about the front page of Wandering Thoughts that has attracted Bingbot's relentless attention. The reasons for web crawlers to behave as they do are ultimately opaque; all I can really do is come up with reasonable sounding theories.
One major obstacle to unifying the two types of package managers
One common reaction on lobste.rs to my entry on how there are two types of package managers was to hope that the two types are unified somehow, or that people can work toward unifying them. Unfortunately, my view is that this currently has a major technical obstacle that we don't have good solutions for, which is the handling of multiple versions of dependencies.
A major difference between what I called program managers (such as Debian's apt) and module managers (such as Python's Pip) is their handling or non-handling of multiple versions of dependencies. Program managers are built with the general assumption of a single (global) version of each dependency that will be used by everything that uses it, while module managers allow each top level entity you use them on (program, software module, etc) to have different versions of its dependencies.
You can imagine a system where a module manager (like pip) hooks into a program manager to install a package globally, or a program manager (like apt) automatically also installs packages from a language source like PyPI. But any simple system like this goes off the rails the moment you have two versions of the same thing that you want to install globally; there's no good way to do it. Ultimately this is because we've made the historical decision in operating systems and language environments that we shouldn't consider version numbers.
In Unix, there is only one thing that can be /usr/include/stdio.h,
and only one thing that can be a particular major version of a
shared library. In a language like Python, there can be only one
thing that is what you get when you do '
import package'. If
two Python programs are working in the same environment, they can't
do 'import package' and get different versions of the module.
This versionless view of various sorts of namespaces (header files,
shared libraries, Python modules, etc) is convenient and humane (no
one wants to do '
import package(version=....)'), but it makes it
hard to support multiple versions.
The state of the art to support multiple "global" versions of the same thing is messy and complex, and as a result isn't widely used. With no system support for this sort of thing, language package managers have done the natural thing and rolled their own approaches to having different environments for different projects so they can have different versions of dependencies. For example, Python uses virtual environments, while Rust and Go gather dependencies in their build systems and statically link programs by default. And to be clear here, modern languages don't do this to be obstinate, they do it because attempting to have a single global environment has been repeatedly recognized as a mistake in practice (just look at Go's trajectory here for one painful example).
(At the operating system level, often people punt and use containers.)
To have a chance of unifying program managers and module managers, we would have to come up with an appealing, usable, humane solution to this problem. This solution somehow has to work well with existing systems, practices, and languages, rather than assuming changes to practices and language implementations, since such changes are unlikely (as always, social problems matter). At the very least, this seems like a tall order.
At least from an outside perspective, Ubuntu is Canonical's thing
In some ways, Ubuntu and Debian look pretty similar to each other. They use more or less the same set of packages (because Ubuntu is mostly based on Debian packages) and they operate relatively similarly, to the extent that we could probably replace our use of Ubuntu with Debian without noticing much change. But there is a big difference. Debian is a community with a particular long-standing philosophy and set of practices, while Ubuntu is Canonical's thing, not a community.
It's true that Ubuntu has a community of people who contribute to it despite Canonical not paying them for their time and work. But this community doesn't get to make real decisions on anything that Canonical cares about, any more than the CentOS (community) board gets to overrule IBM's views on how CentOS should operate. If and when Canonical says 'Ubuntu is doing X' (or 'not doing X'), that's what happens. In a way, there's nothing particularly nefarious about this in the case of Ubuntu; Canonical founded it and has always paid for it and run it, and we've just enjoyed the ride.
(There are benefits to being on Canonical's ride. Contrast the extremely regular and easy to plan for Ubuntu LTS release schedule with the somewhat more anarchic and unpredictable Debian release schedule, for example.)
Or to put it simply, Ubuntu is a Canonical product that Canonical has managed to attract some outside contributions to (beyond PPAs).
As far as I'm concerned, this means that I'm much more inclined to blame Canonical for various aspects of Ubuntu (including the periodic package roulette) than put it on anyone else. Canonical "owns" Ubuntu in a way that no one else does, and so they get to be blamed for the effects of all of the decisions made on Ubuntu. Either Canonical made them, or Canonical didn't care enough to pay attention.
Would some aspects of Ubuntu be better with more community work and contributions (for instance, more actively moving package updates from Debian into Ubuntu LTS)? Assuming that Canonical allowed them, probably, but it's hard for me to see how Canonical could attract them, since if you work on Ubuntu you're voluntarily working on Canonical's product for free. I would expect many people who want to make significant contributions for free to go work on Debian instead.
Perhaps this is not quite how Ubuntu really works. If so, how Canonical (and the community) operate and talk about Ubuntu don't make that at all clear to outside people. It certainly seems that Canonical routinely speaks for Ubuntu, and major technical decisions are routinely made by Canonical (for example, netplan).
(None of this is exactly news, but I feel like writing it down.)
Use virtual environments to install third-party Python programs from PyPI
I've historically not been a fan of Python's virtual environments. While they're easy
to set up these days, they're relatively heavyweight things (taking
up tens of megabytes), they contain embedded references to the
Python version they were built with, and it
felt like vast overkill for simply installing a program (such as
the Python LSP server and the third party
packages it required from PyPI. So I've
historically used Pip's "user" install mode ('
pip install --user'),
which puts everything into your
$HOME/.local directory tree.
However, I now feel that this is a mistake (although an attractive
one). Instead, you should create virtual environments for any third
party commands you install from PyPI or elsewhere.
The problem is that Pip's "user" mode involves pretending
that Pip is basically a Unix distribution's package manager that just happens to be operating
$HOME/.local. This is an attractive illusion and it sort
of works, but in practice you run into issues over time when you
upgrade things, especially if you have more
than one program installed. You'll experience some of these issues
with virtual environments as well, but with single purpose virtual
environments (one venv per program) and keeping track of what you
installed, the ultimate brute force solution is to
delete and recreate the particular virtual environment. The dependency
versions are getting tangled? Delete and recreate. You've moved to a new
distribution version of Python (perhaps you've upgraded from one Ubuntu
LTS to another)? It sounds like a good time to delete and recreate,
rather than dealing with version issues..
More broadly, it feels to me that the Python packaging world is moving strongly toward using virtual environments as the solution to everything. As a result, I don't expect fundamental tools like Pip to spend much development effort on improving management of "user" mode installs. If anything, I expect Pip's user install mode to either quietly decay over time or to get deprecated at some point.
Since I've only recently come around to this view (after actively investigating the situation around upgrading programs with pip), I have no opinions on any of the programs and systems that are designed to make this easier. Pipx was mentioned in a comment by Tom on yesterday's entry, so I'll probably look at that first.
(I do think there are some uses for a Pip "user" install of PyPI packages, but that's for another entry.)