Thinking about two different models of virtualization hosts

November 8, 2020

The conventional way to make a significant commitment to on-premise virtualization is to buy a few big host machines that will each run a lot of virtual machines (possible with some form of shared storage and virtual machine motion from one server to another). In this model you're getting advantages of scale and also fluctuating usage of your virtual machines over time; probably not all of them are all active at once, so you can to some degree over-subscribe your host.

It has recently struck me that there is another approach, where you have a significant number of (much) smaller host machines, each supporting only a few virtual machines (likely without shared storage). This approach has some drawbacks but also has a number of advantages, especially over not doing virtualization at all. The first advantage is that you can deploy hardware without deciding what it's going to be used for; it is just a generic VM host, and you'll set up actual VMs on it later. The second advantage is that it is much easier to deploy actual 'real' servers, since they're now virtual instead of physical; you get things like KVM over IP and remote power cycling for free. On top of this you may make more use of even moderate sized servers, since these days even basic 1U servers can easily be too big for many server jobs.

(You may also be able to make your hardware more uniform without wasting resources on servers that don't 'need' it. If you put 32 GB into every server, you can either run one service that needs 32 GB or several VMs that only need 4 GB or 8 GB. You're not stuck with wasting 32 GB on a server that only really needs 4 GB.)

Having only a few virtual machines on each host machine reduces the blast radius when a host machine fails or has to be taken down for some sort of maintenance (although a big host machine may be more resilient to failure). It also makes it easier to progressively upgrade host machines; you can buy a few new ones at a time, spreading out the costs. And you spend less money on spares as a ratio of your total spending; one or two spare or un-deployed machines cover your entire fleet.

(This also makes it easier to get into virtualization to start with, since you don't need to make a big commitment to expensive hardware that is only really useful for virtualization. You just buy more or less regular servers, although perhaps somewhat bigger than you'd otherwise have done.)

However, this alternate approach has a number of drawbacks. Obviously you have more machines and more hardware to manage, which is more work. You will likely spend more time planning out and managing what VM goes on what host, since there is less spare capacity on each machine and you may need to carefully limit the effects of any given host machine going down. Without troublesome or expensive shared storage, you can't rapidly move VMs from host to host (although you can probably migrate them slowly, by taking them down and copying data over).

In a way you're going to a lot of effort to get KVM over IP, remote management capabilities, and buying somewhat fewer machines (or having less unused server capacity). The smaller the host machines and the fewer VMs you can put on them, the more pronounced this is.

(But at the same time I suspect that there is a sweet spot in the cost versus server capacity in CPU, RAM, and disk space that could be exploited here.)


Comments on this page:

From 193.219.181.219 at 2020-11-09 04:36:03:

Yes, having to play VM Tetris is a problem...

Some virtualization systems do support migration without shared storage and without taking the VM down first, by iteratively applying updates as they occur. It seems QEMU (KVM) and OpenStack call it "block live migration".

We use Hyper-V at work, and although I haven't used its live migration due to CPU mismatches between VM servers, it also has replication which copies the storage in much the same way. So whenever I want to move a VM with minimal downtime, I enable replication, wait until the initial transfer is done, then shut down the old VM and immediately start up the new one, with few minutes of downtime.

By Miksa at 2020-11-09 09:50:20:

We have a few of those kind of standalone virtual hosts, and they are enough of an inconvenience that they should be restricted for only special uses. They have worked best with KVM host and virtual machines that are administered by the same group. In our case all of those need to have their maintenance windows at the same time. The maintenance starts in the host server by disabling autostart for all the virtual machines. After that the virtual machines can update themselves and power off. After all virtual machines have powered off the host server can reboot itself. After the reboot the host needs to start all the virtual machines in correct order and set them to autostart. Disabling autostart is necessary, otherwise by the time last virtual machine has powered off the host has started some of the others.

With host server and virtual machines administered by different groups the overhead for maintenance grows considerably.

Compare this to our VMware clusters. After the most recent ESXi updates were released my coworker updated half a dozen clusters with about 30 host servers in 2 days. During that process the virtual machines were vMotioned between hosts thousands of times.

Shared storage is such a major convenience factor you should aim for it from the start, but there can be compromises. The hosts can use local and shared storage. The virtual machines run normally on the local storage but can be moved to shared during maintenance. But this would be extra work and you would miss out on high availability. If a host server malfunctions you will have a hard time moving the virtual machines to a different host. There is of course VSAN and competitors, which may be the way to go if you don't have powerful SAN or NFS infrastructure.

Of course it's feasible to build a cluster with shared storage using small host servers, just that storage and licensing costs could drive towards bigger hosts. We used to have blade servers with 256 GB of RAM as hosts, but years ago I calculated that it would make economical sense to have at least triple the RAM because of licensing costs.

@193.219.181.219

It sounds like you have some kind of Hyper-V cluster. Wouldn't it make sense to limit the processor features to the common minimal. Our host servers range in age from less than a years to 5+ years and VMware is set to advertise processor features from the oldest generation. Which reminds me, after our latest expansion we decommissioned the oldest generation so we could finally upgrade the processor level by a notch.

By cowardlyAnonymous at 2020-11-13 07:03:13:

Virtualisation is usually sold not on technical merits (it has close to none and makes for a lot more trouble, both technical and workload), but on the economic advantages of "consolidation", that is saving money on hardware, by oversubscribing larger machines that cost less per SPEC/GiB/TPC than smaller machines, so the discussion on the merits of few-big vs. many-small hosts is largely pointless. Besides that, saving on hardware is usually a bad idea, it is much better to overprovide hardware to save on complexity and ops effort.

Anyhow the real reason why virtualization is popular has nothing to do with overall cost savings or higher flexibility, and it is entirely about office politics: it shifts system administration costs and troubles from the operation team to the development teams, and it defines sharper boundaries between them; this is the idea called "DevOps" (which is en euphemism for shifting costs and workload from ops to dev teams, by requiring developers to handle operations too).

That's because there are two popular models (containers are a bit in-between) of app deployment:

  • There are ops machines which run one or more apps, and the running apps are produced by the dev teams but installed and managed, together with the OS on which they run, by the ops team.
  • There are ops machines which run one or more VMs, which the ops team treats as "black boxes", and the dev teams install and manage the OS and apps in those VMs.

In the latter case the only app (and runtime environment) for which the ops team is accountable is the virtualization layer, operation of all other apps (and their runtime environments) has been shifted to the dev teams.

This is usually a terrible idea for the business, but a very good idea for the ops team management, so it is not surprising that many of them who feel their ops teams are very under-resourced (as they usually are) are strongly advocating private clouds to shift a large part of their workload to the dev teams.

There is a special case where the ops team also do some dev work, e.g. for system administration infrastructure, and then the VMs are somewhat convenient.

By Miksa at 2020-11-13 10:29:35:

@cowardlyAnonymous

I strongly disagree. A great merit for virtualization is to separate the service/application from the hardware. In the time I install a new virtual server our datacenter custodian barely manages to receive the delivery men, haul the package in and take the server out of the box. After that comes wrestling the server in to the racks, connecting the half a dozen+ cables, labeling everything and adding the info to the server registry. Then we in the ops team can take over and continue with configuring BIOS, IPMI, RAID, etc. Only after all that comes time for the PXE boot and we can do the actual OS installation. This step is basically identical for physical and virtual servers. And then there is the continuous maintenance. I'm sure the developers appreciate when we inform there will be an outage because the RAID cache battery has failed and the server needs to shutdown so someone can go yank it out of the chassis and install a fresh battery.

The expense is much more than oversubscribing. Lot of services could manage with the power of original Raspberry Pi, but when you add dual PSUs, harddrives, RAID, IPMI, and all the other features you want if you have to deal with hardware you are already spending in thousands. With physical servers you need to overprovide, because adding more resources is a huge hassle. With virtual servers this is trivial, just a matter of few mouse clicks. "What, two CPUs wasn't enough for you. Here, have a dozen."

For us the politics don't matter. Whether physical or virtual servers, the ops team only provides upto a configured OS install. Installing Nginx and PostgreSQL or what ever is the responsibility of the developers or software admins. We handle the monthly OS updates and reboots, and if you installed your software from a proper repository we will update it too free of charge. Any software that doesn't come from a repo is your problem alone. In your two models it would be almost as easy for the ops team to provide OS-less physical servers and let the devs do the rest. Or like in our environment, the devs don't have enough access on the virtual servers to install OS on them, us in the ops team have to do that. Containers are a special case, but it's not the ops team that is clamoring for Docker, if it were upto me Docker would be banned in our environment. It's the developers who want to use it, because the the want to do the duties that are part of the ops team job description, but they do them badly. They just want to do docker pulls but they don't realize the software doesn't update itself, like happens with the ops team's yum updates. When my coworker did our first inventory of Docker use we were horrified how many ancient containers were running on the servers.

By cowardlyAnonymous at 2020-11-13 11:20:47:

«Installing Nginx and PostgreSQL or what ever is the responsibility of the developers or software admins.»

In model #1 that is what ops people do: the manage the apps and their runtime environment, which includes things like Nginx or PostgreSQL; your claim here is actually that in the situation at your site there has been a switch to model #2 even if the ops team provides the OS base layer.

The central feature of model #2 is that accountability for operations beyond the VM infrastructure has been switched away from the ops team. Having a pre-installed base OS layer in the VM is a small detail, the crucial thing as to complexity and accountability is the app and its runtime environment; even cloud providers give you pre-installed base layers, and then it is the dev teams problem about all the rest.

«In your two models it would be almost as easy for the ops team to provide OS-less physical servers and let the devs do the rest.»

That does happen in a few places. But in order to push through the politics of shifting the accountability for apps and their environment to devs one needs an excuse, and the excuse as I wrote is "consolidation", that is saving money on hardware through larger servers and their overcommitment.

«With physical servers you need to overprovide, because adding more resources is a huge hassle.»

It is very easy, you just keep a pool of spare servers. What is a hassle is to migrate applications that grow to bigger servers, but then that is a problem also with the VM infrastructure:

«With virtual servers this is trivial, just a matter of few mouse clicks. "What, two CPUs wasn't enough for you. Here, have a dozen."»

That means that the site is indeed over-providing, if there a dozen CPUs spare available with a click (plus one has to over-provide anyhow as running the VM infrastructure adds an overhead of 15-20% of capacity usually especially for storage intensive apps).

That requires that there be a large global pool of spare physical resources available on demand, which implies that the VM infrastructure provides completely transparent VM migration from one host to another (whichever one has the spare CPUs or memory or storage etc.), which is nightmarishly complex.

That then reduces to the argument is indeed that "consolidation" saves money, but in most places the savings for "consolidation" are trivial compared to the enormous complexity of running a fully "on demand" VM system and the enormous long term cost of making dev teams run the operations of both their apps and runtime environments.

«When my coworker did our first inventory of Docker use we were horrified how many ancient containers were running on the servers.»

But that's exactly the benefit for the ops team of shifting accountability for the "black boxes" to the dev teams: of course handing maintenance of VMs and containers to the dev teams ("DevOps") as a rule results in widespread "abandonware": the dev teams, once they demo the app to "senior management", are not really interested in them, the project is "done", and those apps and their runtime environments usually become abandonware. But an "abandonware" strategy is what really saves money (as long as one is not accountable for the consequences), much more so than "consolidation".

If an ops team "goes cloud" and remains accountable for both the hosts and the contents (apps, runtime environments) of the VMs/containers, then the abandonware problem is at least reduced, but then the ops team have just multiplied their accountabilities and made their job a lot more complicated and exhausting. Some people I know have done that because they were too innocent to understand that "consolidation" was the excuse, not the goal, of "going cloud", and now, as one told me, they are accountable for maintaining not just dozens of hosts running very complex VM infrastructure configurations, but also for maintaining 1,200 "not-black-box" VMs containing every possible OS and variant and every possible runtime environment variant, because the dev teams went wild with them and then tossed them over the wall while saying "you are it!" to the ops team.

By cowardlyAnonymous at 2020-11-13 11:35:19:

«Lot of services could manage with the power of original Raspberry Pi, but when you add dual PSUs, harddrives, RAID, IPMI, and all the other features you want if you have to deal with hardware you are already spending in thousands. With physical servers you need to overprovide, because adding more resources is a huge hassle.»

That is really a bizarre and absurd argument because in model #1 one runs multiple applications and their runtime environments on the same servers, which is possible if they are all managed by the ops teams; the every essence of VMs is to allow management of apps and their runtime environments by multiple dev teams, because each app with its runtime environment is isolated in their VM (or container). Of course this leads to a large proliferation of apps and a wide divergence of their runtime environments.

«In your two models it would be almost as easy for the ops team to provide OS-less physical servers and let the devs do the rest.» «That does happen in a few places.»

And then one can provide pretty small servers, "blades" and ARM-based micro-servers are targeted precisely at that. I personally think that, if one wants to push app and runtime maintenance to dev teams (in order to reduce costs through turning them into abandoware), that's a much better solution than VMs in a large majority of cases, as it is much simpler and cheaper.

One complication with VMs is that most apps and runtime environments don't have huge requirements for resilience, but if one runs a large pool of virtual resources one has to gold-plate all physical VM hosts "just in case" one of the VMs it runs does have high requirements, that is all physical hosts have "worst case scenario" requirements. If one creates multiple pools with different configuration then the advantage of single large pool of idle resources ready to be allocated on demand largely disappears.

Written on 08 November 2020.
« Turning on console blanking on a Linux machine when logged in remotely
Getting the git tags that are before and after a commit (in simple cases) »

Page tools: View Source, View Normal, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Sun Nov 8 23:47:39 2020
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.