Using WireGuard seriously as a mesh network needs a provisioning system

May 10, 2025

One thing that my recent experience expanding our WireGuard mesh network has driven home to me is how (and why) WireGuard needs a provisioning system, especially if you're using it as a mesh networking system. In fact I think that if you use a mesh WireGuard setup at any real scale, you're going to wind up either adopting or building such a provisioning system.

In a 'VPN' WireGuard setup with a bunch of clients and one or a small number of gateway servers, adding a new client is mostly a matter of generating and giving it some critical information. However, it's possible to more or less automate this and make it relatively easy for people who want to connect to you to do this. You'll still need to update your WireGuard VPN server too, but at least you only have one of them (probably), and it may well be the host where you generate the client configuration and provide it to the client's owner.

The extra problem with adding a new client to a WireGuard mesh network is that there's many more WireGuard nodes that need to be updated (and also the new client needs a lot more information; it needs to know about all of the other nodes it's supposed to talk to). More broadly, every time you change the mesh network configuration, every node needs to update with the new information. If you add a client, remove a client, a client changes its keys for some reason (perhaps it had to be re-provisioned because the hardware died), all of these means nodes need updates (or at least the nodes that talk to the changed node). In the VPN model, only the VPN server node (and the new client) needed updates.

Our little WireGuard mesh is operating at a small scale, so we can afford to do this by hand. As you have more WireGuard nodes and more changes in nodes, you're not going to want to manually update things one by one, any more than you want to do that for other system administration work. Thus, you're going to want some sort of a provisioning system, where at a minimum you can say 'this is a new node' or 'this node has been removed' and all of your WireGuard configurations are regenerated, propagated to WireGuard nodes, trigger WireGuard configuration reloads, and so on. Some amount of this can be relatively generic in your configuration management system, but not all of it.

(Many configuration systems can propagate client-specific files to clients on changes and then trigger client side actions when the files are updated. But you have to build the per-client WireGuard configuration.)

PS: I haven't looked into systems that will do this for you, either as pure WireGuard provisioning systems or as bigger 'mesh networking using WireGuard' software, so I don't have any opinions on how you want to handle this. I don't even know if people have built and published things that are just WireGuard provisioning systems, or if everything out there is a 'mesh networking based on WireGuard' complex system.


Comments on this page:

From 104.28.104.15 at 2025-05-11 08:31:32:

i strongly urge you to check out tailscale. it’s one of the few products i want throw money at because they do such a good job.

https://tailscale.com/

By vcarceler@elpuig.xeill.net at 2025-05-12 15:57:56:

Do you know innernet?

https://github.com/tonarino/innernet

It simplifies the management of mesh networks with WireGuard.

By cks at 2025-05-12 16:21:04:

Innernet looks like an interesting approach to the general problem and I hadn't heard of it before; thanks! We may explore it if our WireGuard mesh needs (for servers) scale up.

By Simon at 2025-05-13 10:27:39:

Some amount of this can be relatively generic in your configuration management system, but not all of it.

What's the challenge with using your normal configuration management system for this?

IIUC you are talking about your servers here, so this should be relatively slow changing (so you can deploy this like any configuration change without special handling for hot-reloading and stuff like that) and no tricky special cases (like the only link to a node is the connection you are about to reconfigure).

By cks at 2025-05-13 11:29:22:

The general problem is that in the simplest configuration, every node needs every other node's public configuration, but not its own public configuration. This means every node has a different configuration file for its peers. This gets worse if you want to filter node visibility for access control, so that sensitive nodes don't have peer configurations for other nodes that shouldn't be talking to them anyway. You need to automate the generation of these per-node peer lists somehow in order to have a real provisioning system, and most configuration management systems aren't built for such per-host dynamically generated configuration files.

The only time you get a straightforward situation is if you have a pool of client nodes that talk to a pool of server nodes but not each other or anything else. Then you can build a single file of server node peer configurations and push that out to all of the clients in the pool.

«If you add a client, remove a client, a client changes its keys for some reason (perhaps it had to be re-provisioned because the hardware died), all of these means nodes need updates (or at least the nodes that talk to the changed node). In the VPN model, only the VPN server node (and the new client) needed updates.»

If the mesh network is physically a mesh network there are some complications because there is a need to compute routes across the mesh, and in that case the best solution is like with the UUCP network to distribute full or "most interesting" shared lists or maps of the mesh to each host, not just of its immediate neighbors.

If the people who write mesh network software were smarter they would require not one per-node configuration, but two things: a per-mesh (or per-mesh-subset) list./map and the name of the current node and then work out everything else walking the list/map from that node name. The same effect can be achieved by writing a location-independent configuration in a some kind of configuration management system and a script to convert it to one from the point of view of a specific node, but it would be much simpler if the mesh software were written to do that itself.

But if you are using WireGuard purely for "secure" communications there is no need to have a mesh network (of tunnels) with a distinct VPN from each host to each other host, any more that there is any need to have a SSH mesh network. "VPN" is one of those usually damaging "conventional wisdom" notions like "top of rack switch".

All that is needed is some system for encrypting IP packet payloads with some key known to the destination. It turns out that such a system is called IPsec and it has been around since 1995-1998. Too bad that IPsec itself is often misused to setup "VPN"s (tunnels).

With GNU/Linux the IPsec situation is particularly nice because the commonly used strongSwan package can use directly SSH keys for IPsec so setting up any number of IPsec nodes is trivial and they can all have pretty much the same configuration.

https://www.sabi.co.uk/blog/14-two.html?141211#141211 "IPsec possibilities and realities"

One of the usual questions like with SSL, SSH (or even Kerberos) etc. is whether to have a different key for every node or share the same key across all nodes or some subsets of nodes, which simplifies key distribution (but makes more work for key retraction).

Written on 10 May 2025.
« Some notes on using 'join' to supplement one file with data from another
How and why typical (SaaS) pricing is too high for university departments »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sat May 10 22:45:36 2025
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.