What sort of server it takes to build Firefox in four and a bit minutes

March 6, 2022

I tweeted:

Our new compute server builds Firefox Nightly from source in 4 minutes 22 seconds, as well it should given how beefy it is. The build process does manage to get the machine to 100% usage of all CPUs, which is impressive (and nice) since there are 112 of them.

Then in response to a question about the link phase's duration and parallelism:

The usage of all CPUs lasts about two minutes, with a ramp up at the start and down at the end, then some wiggly usage of a lot fewer CPUs, with a couple of periods where only three are being used (and up to 10-13 in use). I can't see a clear link phase in the build report.

(This is for a full from scratch build.)

This is a (very) big server but in some respects not an unusual one; subject to component availability (and money), anyone could get their own version relatively trivially. It has two AMD Epyc 7453 28-core processors (hence the odd number of CPUs), 512 GB of RAM, and two NVMe drives in a software RAID mirror.

(Or instead of buying one, you could probably rent time on an equivalent server from a cloud provider on a minute by minute basis.)

As officially reported in Ubuntu 18.04, the server has only two NUMA zones:

node distances:
node   0   1 
  0:  10  32 
  1:  32  10 

I'm not sure this is completely accurate for Zen 3 Epycs in practice, but it's the official view.

One interesting effect of the NVMe drives (and perhaps other things) is that a build with a cold disk cache for the Firefox Nightly source tree is only a few seconds slower than a build with a warm cache. However, the NVMe drives do slow down enough under load that despite Linux software RAID preferring to read from the first disk, it sometimes reads from the second drive, although nowhere as much. This has the interesting effect that the second NVMe drive reports a lower average IO time than the first one.

(All of this is captured with a certain degree of granularity through our Prometheus based metrics system.)

There was a day when a system like this would have been essentially impossible for us to have; something this big and powerful would have been far too expensive for us to afford (and also physically very large and probably required special power). I rather enjoy how the world has changed to bring more and more computational power into our hands.

Comments on this page:

What is the price tag for this server ?

By cks at 2022-03-07 08:54:25:

It looks like the list price of this server configuration is around $12,000 US (we got somewhat different pricing as an educational institution). It's a 1U server, which also feels amazing to me in its own way.

(It also feels odd to me that this is now something you can casually price out online in various reseller system configurators, instead of having to call a salesperson because it's too unusual to be in the catalog.)

By MikeP at 2022-03-10 12:50:36:

Weird indeed. From 2005 or so til 2008 when I changed jobs, I was the primary systems administrator for an SGI Altix whose final configuration was 64 Itanium2 CPUs, 192GB of RAM, and a few mumblety TB of disk. The MSRP on that bad boy was somewhere north of a million dollars, not that anybody ever paid list. It occupied two nearly-full full racks, and the power requirements weren't "weird" or anything if you had a regular data centre, but they were a fair amount - IIRC 2 x 240V 30a circuits, one per rack, and we didn't have redundant power. We ran SGI's Linux distributions on it, one was RHEL-based and the other SLES (I forget which order they came in).

It had more than two NUMA zones though. :-) IIRC the architecture was you grouped CPUs and RAM into C-bricks and each C-brick was its own zone. Up to 4 CPUs and some amount of RAM per brick. When we first acquired it, it was only 16 CPUs, 64GB and a single rack.

Naturally, researchers mostly wrote single-threaded applications for it that they compiled with gcc (vs the licensed Intel compilers that blew the doors off gcc at the time), and generally used it the same way they would have 64 separate P4 workstations. Oh well.

It's amazing how inefficient compiling Firefox is. It should take less than a second, but that would only result from proper design at each stage.

Written on 06 March 2022.
« Dynamic web pages can be viewed as a form of compression (sometimes)
The convenience of multi-purpose monitoring (in Prometheus) »

Page tools: View Source, View Normal, Add Comment.
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Mar 6 23:57:03 2022
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.