Software-Defined Datacenter

#vK8s 2021 edition – friends don’t let friends run Kubernetes on bare-metal

January 6, 2025September 1, 2021 by Bjoern Brundert

Three years ago, I wrote a blogpost on why you wouldn’t want to run Kubernetes on bare-metal. VMware released a number of platform enhancements over these years and there is a lot of updated material and feedback – also coming from customers. So what are (my personal) reasons to run containers and Kubernetes (short “K8s”) on a virtual infrastructure & vSphere in particular?

Operations: Running multiple clusters on bare-metal is hard

Multiple clusters in a virtual environment are a lot easier and each cluster can leverage e.g. it‘s own lifecycle policies (e.g. for K8s version upgrades) instead of forcing one bare-metal cluster to upgrade. Running multiple Kubernetes versions side-by-side might be already or become a requirement in the near future.
It also makes lots of sense to run Kubernetes side-by-side with your existing VMs instead of building a new hardware silo and operational complexity
VMware’s compute platform vSphere is the de-facto standard for datacenter workloads in companies across industries and operational experience and resources are available across the globe. Bare-metal operations typically introduces new risks and operational complexity.

Availability/Resilience and Quality of service: you can plan for failures without compromising density

Virtual K8s clusters could benefit even in „two physical datacenter” scenarios where the underlying infrastructure is spread across both sites. A “stretched” platform (e.g. vSphere with vSAN Stretched Cluster) allows you to run logical three-node Kubernetes control planes in VMs and protect the control plane and workload nodes using vSphere HA.
vSphere also allows you to prioritize workloads by configuring policies (networking, storage, compute, memory) that will also be enforced during outages (Network I/O Control, Storage I/O Control, Resource Pools, Reservations, Limits, HA Restart Priorities, …)
- Restart a failed or problematic Kubernetes node VM before Kubernetes itself even detects a problem.
- Provide the Kubernetes control plane availability by utilizing mature heartbeat and partition detection mechanisms in vSphere to monitor servers, Kubernetes VMs, and network connectivity to enable quick recovery.
- Prevent service disruption and performance impacts through proactive failure detection, live migration (vMotion) of VMs, automatic load balancing, restart-due-to-infrastructure failures, and highly available storage

Resource fragmentation, overhead & capacity management: single-purpose usage of hardware resources vs. multi-purpose platform

Running Kubernetes clusters virtually and using VMware DRS to balance these clusters across vSphere hosts allows the deployment of multiple K8s cluster on the same hardware setup and increasing utilization of hardware resources
When running multiple K8s clusters on dedicated bare-metal hosts, you lose the overall capability to utilize hardware resources across the infrastructure pool
- Many environments won‘t be able to (quickly) repurpose existing capacity from one bare-metal host in one cluster to another cluster in a short timeframe
From a vSphere perspective, Kubernetes is yet another set of VMs and capacity management can be done across multiple Kubernetes clusters; it gets more efficient the more clusters you run
- Deep integrations with existing operational tools like vRealize Operations allow operational teams to deliver Kubernetes with confidence
K8s is only a Day-1 scheduler and does not perform resource balancing based on running pods
- In case of imbalance on the vSphere layer, vSphere DRS rebalances K8s node VMs across the physical estate to better utilize the underlying cluster and delivers best-of-both-worlds from a scheduling perspective
High availability and „stand-by“ systems are cost intensive in bare-metal deployments, especially in edge scenarios: in order to provide some level of redundancy, some spare physical hardware capacity (servers) need to be available. In worst case you need to reserve capacity per cluster which increases physical overhead (CAPEX and OPEX) per cluster.
- vSphere allows you to share failover capacity incl. incl strict admission control to protect important workloads across Kubernetes clusters because the VMs can be restarted and reprioritized e.g. based on the scope of a failure

Single point of integration with the underlying infrastructure

A programmable, Software-Defined Datacenter: Infrastructure as Code allows to automate all the things on an API-driven datacenter stack
Persistent storage integration would need to be done for each underlying storage architecture individually, running K8s on vSphere allows to leverage already abstracted and virtualized storage devices
Monitoring of hardware components is specific to individual hardware choices, vSphere offers an abstracted way of monitoring across different hardware generations and vendors

Security & Isolation

vSphere delivers hardware-level isolation at the Kubernetes cluster, namespace, and even pod level
VMware infrastructure also enables the pattern of many smaller Kubernetes clusters, providing true multi-tenant isolation with a reduced fault domain. Smaller clusters reduce the blast radius, i.e. any problem with one cluster only affects the pods in that small cluster and won’t impact the broader environment.
In addition, smaller clusters mean each developer or environment (test, staging, production) can have their own cluster, allowing them to install their own CRDs or operators without risk of adversely affecting other teams.

Credits and further reading

The content above is mainly a summary of existing materials & personal observations (summarized by me) and based on lots of pre-work, VMworld presentations and whitepapers from colleagues like Michael Gasch, Frank Denneman, Kit Colbert, Kenny Coleman, Robert Guske, Robbie Jerrom and many more!
There has been a ton of material published around this topic recently (and some awesome foundational work by Michael Gasch incl. his KubeCon talk), I want to list a few of the public resources here:
- Why Choose VMware Virtualization for Kubernetes and Containers (Blogpost, January 2021)
- vSphere with Tanzu Supports 6.3 Times More Container Pods than Bare Metal (Blogpost, August 2021)
  - Full Study/Paper (PDF, August 2021)
- Kubernetes Resource Management for vSphere Admins (Blogpost and VMworld video, November 2019)
- The Value of vSphere in a Kubernetes World (Blogpost)
- Containers on Virtual Machines or Bare-Metal? (Whitepaper)
- Performance of Enterprise Web Applications in Docker Containers on VMware vSphere 6.5 (Blogpost and link to Whitepaper)
- VMs and Containers – Friends or Enemies (Slidedeck by Simone Morellato)
- VMworld 2018: The Value of Running Kubernetes on vSphere (video) (shout out to my friends Michael Gasch and Frank Denneman)

#vK8s – friends don’t let friends run Kubernetes on bare-metal

So, no matter what your favorite Kubernetes framework is these days – I am convinced it runs best on a virtual infrastructure and of course even better on vSphere. Friends don’t let friends run Kubernetes on bare-metal. And what hashtag could summarize this better than something short and crips like #vK8s ? I liked this idea so much that I created some “RUN vK8s” images (inspired by my colleagues Frank Denneman and Duncan Epping – guys, it’s been NINE years since RUN DRS!) that I want to share with all of you. You can find the repository on GitHub – feel free to use them whereever you like.

vSphere Integrated Containers 0.9 (OSS version) now available

January 6, 2025March 20, 2017 by Bjoern Brundert

Great news for everyone that wants to run Docker on vSphere. VMware released the Open Source version of vSphere Integrated Containers 0.9 which is now available via bintray here.

Please note: this is an interim pre-release and does not include support from VMware global support services (GSS). Support is OSS community level only.

Changes from the 0.8 version are documented here:

vSphere HTML 5 Client plugin. See https://blogs.vmware.com/vsphere/2016/12/new-vcenter-management-clients-vsphere-6-5.html for details on the HTML 5 Client
Upgrade from v0.8.0 using vic-machine upgrade. Includes rollback on failure.
Docker Client 1.13 support and Docker API version 1.25 support.
Initial Docker container event support. See https://docs.docker.com/engine/reference/commandline/events/

You can now go to https://vmware.github.io/vic-product/index.html#getting-started and see the documentation for the lastest official product version (in this case VIC Engine 0.8 as part of vSphere Integrated Containers 1.0) and the current OSS release (in this case VIC Engine 0.9).

Supported Docker Commands for 0.9 are listed in the documentation at https://vmware.github.io/vic-product/assets/files/html/latest/vic_app_dev/container_operations.html.

VMware Cloud Foundation – links to key product resources

January 6, 2025September 8, 2016 by Bjoern Brundert

There has been a lot of VMworld coverage over the last 10 days. I just wanted to summarize some of the key resources around one of the big announcements: VMware Cloud Foundation!

Please note, there will also be a webinar on September 13, 2016 so make sure to check this out in case you are looking for some more details on this!

vcf1

vcf2

Reset to Standard vSwitch from Distributed vSwitch on homelab Intel NUC

January 6, 2025August 1, 2016 by Bjoern Brundert

I just had to reset my homelab Intel NUC’s ESXi 6.0 network configuration because I wanted to test a specific setting in vSphere Integrated Containers. Unfortunately, the Intel NUC only has one physical uplink and that uplink (and VMkernel Portgroup) was configured on a Distributed vSwitch – I needed it on a Standard vSwitch for the test. Migrating the VMkernel Portgroup from the Distributed to a Standard vSwitch was a little challenging and I didn’t want to set up an external monitor to use the Direct Console User Interface (DCUI). But with the help of William’s ESXi virtual appliance and some hints in the vSphere documentation, I was able to reproduce the necessary keyboard inputs and perform it with only a USB keyboard attached to the NUC. Instead of summarizing it only for myself, I though I’ll share it here as I couldn’t find similar instructions on google.

Please don’t do this in a production environment, blindly configuring a system isn’t a good idea.

tl;dr: the steps are: F2 – TAB – <root_password> – ENTER – DOWN – DOWN – DOWN – DOWN – ENTER – DOWN – ENTER – F11

What is actually going on if you could view DCUI? First, you need to use/press F2 (and potentially “fn” or similar) to get into ESXi’s DCUI system management:

It will ask you to authenticate first (pressing TAB – <root_password> – ENTER):

Bildschirmfoto 2016-08-01 um 08.15.31

Then, you need to go to “Network Restore Options” in the System Customization menu (pressing DOWN – DOWN – DOWN – DOWN – ENTER):

Bildschirmfoto 2016-08-01 um 07.58.48

And in the “Network Restore Options”, you’ll have the option to “Restore Standard Switch” (pressing DOWN – ENTER – F11):

Bildschirmfoto 2016-08-01 um 07.59.11

After selecting “Standard Switch”, you’ll need to confirm a new dialog with “F11” and then a new vSwitch will be created on your host. Mine worked like a charm, I found a new Standard vSwitch with vmk0 using my “old” management IP address for ESXi.

On the multi-dimensional evolution of platforms and applications

May 18, 2016May 4, 2016 by Bjoern Brundert

I’d like to touch on a topic that I am seeing in several of my areas of interest right now. In general, it’s related a lot to the overall topic of “First, Second and Third Platform” but I’d like to focus more on the individual implications for multiple domains. Over the course of the last few months, I have been involved in several discussions around different platforms and applications as well as their individual evolution and maturity. My personal observation is that both don’t necessarily evolve synchronously. Therefore, it is important to not only identify the phase that you are currently in but also to understand the operational implications of the “generation disconnect” between app and platform.

Evolution of Platforms

As mentioned above, I’d like to relate my observations to the three platform generations below. I’d like to point out that these three generations are subdivided in several different technologies and can look and feel different in specific use-cases or fields of application. The common sense around the generations is:

platform_evolution

In addition to these phases, I see different “implementations” of the respective platform generation. Take Client-Server as one example – this can be a physical-server-only model, this also stretches to server virtualization and potentially even to “VM-oriented” hosting or even cloud services. My friend Massimo also wrote a nice piece on this.

Evolution of Applications

One of my key observations is that there is no simple 1:1 connection between applications and platforms. With the rise of 2nd generation platforms, not all applications from the 1st platform have been dropped and immediately been available for the next-generation platform. It’s actually an evolution for applications that are still business-relevant and therefore make sense to be optimized for the next-generation platform. And here comes the important observation: I believe there are (at least) three phases in an application evolution cycle that is happening for each platform generation – or potentially even in each concrete implementation of the platform generation. I’ll call theses phases “Unchanged”, “Optimized” and “Purpose-built” for now:

application_evolution

But how does that fit in the overall platform picture? I’ll try to merge the previous two pictures into one. It also shows a potential application evolution path across platform generations. As you can see, there can be a slight overlap between “purpose-built” of the previous and the “unchanged” phase of the next-generation platform.

evolution

But let’s move on to two concrete examples that I see applicable.

Example 1: Network Functions Virtualization

I’ll start with Network Functions Virtualization (NFV). NFV is a Telco industry movement that is supposed to provide hardware independence, new ways of agility, reduced time to market for carrier applications & services, cost reduction and much more – basically, it’s about the delivery of promises of Cloud Computing (and Third Platform) for the Telco industry (read more about it here). The famous architectural overview as described by ETSI can be seen below:

ETSI NFV

NFV differentiates between several functional components such as the actual platform (NFVI = Network Functions Virtualization Infrastructure), the application (VNF = Virtual Network Function), the application manager (VNF manager) and e.g. the orchestration engine.

So how could this look in reality? Let’s assume the VNF manager detects a certain usage pattern of its VNF and that VNF is reaching it’s potential maximum scale for the currently deployed amount of VNF instances. The VNF manager then talks to the Orchestrator that could then trigger e.g. the scale-out of the VNF by deploying additional worker instances on the underlying infrastructure/platform resources. The worker instances of the VNF could then automatically be included in the load distribution and have instant integration into necessary backend services where applicable. All of that happens via open APIs and standardized interfaces – it looks and feels a lot like a typical implementation example for a “third platform” including the “purpose-built” app.

Now into a quick reality check. ETSI’s initial NFV whitepaper is from October 2012. It basically describes the destination that the industry is aiming for. And while there might be some examples where VNFs, NFVI and Orchestration are already working hand in hand, there is still a lot of work to do. Some of the “NFV applications” (or VNFs) might have been just „P2V“’ed (1:1 physical to virtual conversion) onto a virtualization platform and basically have the same configuration, same identity and are kept as close to its physical origins as possible. This allows a VNF provider/vendor to keep existing support procedures and organizations while offering their customers a “NFV 1.0 product” that is providing some early benefits of NFV (hardware independence, faster time to market, …). But this also implies that you transfer some of the configurations that made perfect sense in the physical world over to the virtual world – where it only makes questionable sense. In this case, I’d actually talk about a move from a “purpose-built” app from the first platform to an “unchanged” app on the second platform.

One example: one physical server in a telco application had 30*300GB harddisks, had 2*4Core CPUs and 128GB RAM. It never used more than 1TB of storage and average utilization has been below 4 CPUs and 32GB RAM. The “unchanged” version of this app would be a 1:1 conversion with all (unnecessary) resource overhead provided in a virtual machine. The “optimized” version of this app is a right-sized application (so only 1TB storage, 4 CPUs and 32GB RAM) that is also leveraging easy configuration files for installation as well as crash-consistent and persistent data management to allow backup & restore as VM. But a “purpose-built” version of that app would leverage the underlying NFVI APIs, would allow scale-out deployment options based on actual demand as well as optimizations that are e.g. encryption at every layer of the application to ensure global deployment models even in the face of lawful interception relevance, etc.

Example 2: Microservices, Containers & Docker

My next example are microservices and their close friends containers. They are promising a new generation of application architecture and are drivers for the “3rd platform” architecture. One of this movements famous poster-childs is Docker. Docker is a great (new) way to package and distribute applications in “containers” that contain applications or just pieces of a larger application architecture. Newly developed applications usually follow a scale-out design, some might be written with something like the “12 factor app” manifesto in mind (or the 15 factors according to Pivotal). Coming back to the pictures above: a 12 factor app could be considered “purpose-built” for the “third platform”.

But how many applications have been built for this? There are many great examples for microservices-oriented applications by the “cloud-native” companies such as Google, Amazon, Facebook and the likes. Adrian Cockcroft also gives inspirational talks about these topics around the globe. But I actually expect many applications to stay mainly unchanged as they are optimized for their current platform. At the same time, some of them might become available as (Docker) containers as part their next release. But again – if you look into the details, you’ll find the same application in a different wrapper. RAR is now ZIP (for my German readers: “Aus Raider wird nun Twix…”). But will these potentially “single-container-applications” run well on a Cloud-Native/third platform architecture? They might not! To put it in a picture:

Containerizing legacy applications pic.twitter.com/NXeHRhWVcz

— Kenny Bastani (@kennybastani) May 17, 2016

So in this case, it is actually important to understand these application limitations and expectations towards the platform (what about data persistence, security, platform resilience, networking, …) to make sure it runs smoothly in production. Coming back to Massimo’s blogpost – you can run your old Windows NT4 on a Public Cloud, but does it make sense?

Summary

Just like the continuous evolution of platforms that expose new characteristics and capabilities, there is also an ongoing evolution of applications. It is important to understand the key aspects of the application architecture and it’s deployment model before making a platform decision. The word “VNF” does not necessary imply the alignment with NFV and the word “Docker” does not automatically describe a Cloud-Native or microservices-oriented application.

Edits:

18.05.2016: added picture (containerizing legacy applications)

Operations: Running multiple clusters on bare-metal is hard

Availability/Resilience and Quality of service: you can plan for failures without compromising density

Resource fragmentation, overhead & capacity management: single-purpose usage of hardware resources vs. multi-purpose platform

Single point of integration with the underlying infrastructure

Security & Isolation

Credits and further reading

#vK8s – friends don’t let friends run Kubernetes on bare-metal

Share this:

Share this:

Share this:

Share this:

Evolution of Platforms

Evolution of Applications

Example 1: Network Functions Virtualization

Example 2: Microservices, Containers & Docker

Summary

Share this: