#vK8s 2021 edition – friends don’t let friends run Kubernetes on bare-metal

Three years ago, I wrote a blogpost on why you wouldn’t want to run Kubernetes on bare-metal. VMware released a number of platform enhancements over these years and there is a lot of updated material and feedback – also coming from customers. So what are (my personal) reasons to run containers and Kubernetes (short “K8s”) on a virtual infrastructure & vSphere in particular?

Operations: Running multiple clusters on bare-metal is hard

  • Multiple clusters in a virtual environment are a lot easier and each cluster can leverage e.g. it‘s own lifecycle policies (e.g. for K8s version upgrades) instead of forcing one bare-metal cluster to upgrade. Running multiple Kubernetes versions side-by-side might be already or become a requirement in the near future.
  • It also makes lots of sense to run Kubernetes side-by-side with your existing VMs instead of building a new hardware silo and operational complexity
  • VMware’s compute platform vSphere is the de-facto standard for datacenter workloads in companies across industries and operational experience and resources are available across the globe. Bare-metal operations typically introduces new risks and operational complexity.

Availability/Resilience and Quality of service: you can plan for failures without compromising density

  • Virtual K8s clusters could benefit even in „two physical datacenter” scenarios where the underlying infrastructure is spread across both sites. A “stretched” platform (e.g. vSphere with vSAN Stretched Cluster) allows you to run logical three-node Kubernetes control planes in VMs and protect the control plane and workload nodes using vSphere HA.
  • vSphere also allows you to prioritize workloads by configuring policies (networking, storage, compute, memory) that will also be enforced during outages (Network I/O Control, Storage I/O Control, Resource Pools, Reservations, Limits, HA Restart Priorities, …)
    • Restart a failed or problematic Kubernetes node VM before Kubernetes itself even detects a problem.
    • Provide the Kubernetes control plane availability by utilizing mature heartbeat and partition detection mechanisms in vSphere to monitor servers, Kubernetes VMs, and network connectivity to enable quick recovery.
    • Prevent service disruption and performance impacts through proactive failure detection, live migration (vMotion) of VMs, automatic load balancing, restart-due-to-infrastructure failures, and highly available storage

Resource fragmentation, overhead & capacity management: single-purpose usage of hardware resources vs. multi-purpose platform

  • Running Kubernetes clusters virtually and using VMware DRS to balance these clusters across vSphere hosts allows the deployment of multiple K8s cluster on the same hardware setup and increasing utilization of hardware resources
  • When running multiple K8s clusters on dedicated bare-metal hosts, you lose the overall capability to utilize hardware resources across the infrastructure pool
    • Many environments won‘t be able to (quickly) repurpose existing capacity from one bare-metal host in one cluster to another cluster in a short timeframe
  • From a vSphere perspective, Kubernetes is yet another set of VMs and capacity management can be done across multiple Kubernetes clusters; it gets more efficient the more clusters you run
    • Deep integrations with existing operational tools like vRealize Operations allow operational teams to deliver Kubernetes with confidence
  • K8s is only a Day-1 scheduler and does not perform resource balancing based on running pods
    • In case of imbalance on the vSphere layer, vSphere DRS rebalances K8s node VMs across the physical estate to better utilize the underlying cluster and delivers best-of-both-worlds from a scheduling perspective
  • High availability and „stand-by“ systems are cost intensive in bare-metal deployments, especially in edge scenarios: in order to provide some level of redundancy, some spare physical hardware capacity (servers) need to be available. In worst case you need to reserve capacity per cluster which increases physical overhead (CAPEX and OPEX) per cluster.
    • vSphere allows you to share failover capacity incl. incl strict admission control to protect important workloads across Kubernetes clusters because the VMs can be restarted and reprioritized e.g. based on the scope of a failure

Single point of integration with the underlying infrastructure

  • A programmable, Software-Defined Datacenter: Infrastructure as Code allows to automate all the things on an API-driven datacenter stack
  • Persistent storage integration would need to be done for each underlying storage architecture individually, running K8s on vSphere allows to leverage already abstracted and virtualized storage devices
  • Monitoring of hardware components is specific to individual hardware choices, vSphere offers an abstracted way of monitoring across different hardware generations and vendors

Security & Isolation

  • vSphere delivers hardware-level isolation at the Kubernetes cluster, namespace, and even pod level
  • VMware infrastructure also enables the pattern of many smaller Kubernetes clusters, providing true multi-tenant isolation with a reduced fault domain. Smaller clusters reduce the blast radius, i.e. any problem with one cluster only affects the pods in that small cluster and won’t impact the broader environment.
  • In addition, smaller clusters mean each developer or environment (test, staging, production) can have their own cluster, allowing them to install their own CRDs or operators without risk of adversely affecting other teams.

Credits and further reading

#vK8s – friends don’t let friends run Kubernetes on bare-metal

So, no matter what your favorite Kubernetes framework is these days – I am convinced it runs best on a virtual infrastructure and of course even better on vSphere. Friends don’t let friends run Kubernetes on bare-metal. And what hashtag could summarize this better than something short and crips like #vK8s ? I liked this idea so much that I created some “RUN vK8s” images (inspired by my colleagues Frank Denneman and Duncan Epping – guys, it’s been NINE years since RUN DRS!) that I want to share with all of you. You can find the repository on GitHub – feel free to use them whereever you like. 

vSphere Integrated Containers 0.9 (OSS version) now available

Great news for everyone that wants to run Docker on vSphere. VMware released the Open Source version of vSphere Integrated Containers 0.9 which is now available via bintray here.

Please note: this is an interim pre-release and does not include support from VMware global support services (GSS). Support is OSS community level only.

Changes from the 0.8 version are documented here:

You can now go to https://vmware.github.io/vic-product/index.html#getting-started and see the documentation for the lastest official product version (in this case VIC Engine 0.8 as part of vSphere Integrated Containers 1.0) and the current OSS release (in this case VIC Engine 0.9).

Supported Docker Commands for 0.9 are listed in the documentation at https://vmware.github.io/vic-product/assets/files/html/latest/vic_app_dev/container_operations.html.

VMware Cloud Foundation – links to key product resources

There has been a lot of VMworld coverage over the last 10 days. I just wanted to summarize some of the key resources around one of the big announcements: VMware Cloud Foundation!

Please note, there will also be a webinar on September 13, 2016 so make sure to check this out in case you are looking for some more details on this!

vcf1

vcf2

Reset to Standard vSwitch from Distributed vSwitch on homelab Intel NUC

I just had to reset my homelab Intel NUC’s ESXi 6.0 network configuration because I wanted to test a specific setting in vSphere Integrated Containers. Unfortunately, the Intel NUC only has one physical uplink and that uplink (and VMkernel Portgroup) was configured on a Distributed vSwitch – I needed it on a Standard vSwitch for the test. Migrating the VMkernel Portgroup from the Distributed to a Standard vSwitch was a little challenging and I didn’t want to set up an external monitor to use the Direct Console User Interface (DCUI). But with the help of William’s ESXi virtual appliance and some hints in the vSphere documentation, I was able to reproduce the necessary keyboard inputs and perform it with only a USB keyboard attached to the NUC. Instead of summarizing it only for myself, I though I’ll share it here as I couldn’t find similar instructions on google.

Please don’t do this in a production environment, blindly configuring a system isn’t a good idea.

tl;dr: the steps are: F2 – TAB – <root_password> – ENTER – DOWN – DOWN – DOWN – DOWN – ENTER – DOWN – ENTER – F11

 

What is actually going on if you could view DCUI? First, you need to use/press F2 (and potentially „fn“ or similar) to get into ESXi’s DCUI system management:

Bildschirmfoto 2016-08-01 um 07.58.16

It will ask you to authenticate first (pressing TAB – <root_password> – ENTER):

Bildschirmfoto 2016-08-01 um 08.15.31

Then, you need to go to „Network Restore Options“ in the System Customization menu (pressing DOWN – DOWN – DOWN – DOWN – ENTER):

Bildschirmfoto 2016-08-01 um 07.58.48

And in the „Network Restore Options“, you’ll have the option to „Restore Standard Switch“ (pressing DOWN – ENTER – F11):

Bildschirmfoto 2016-08-01 um 07.59.11

After selecting „Standard Switch“, you’ll need to confirm a new dialog with „F11“ and then a new vSwitch will be created on your host. Mine worked like a charm, I found a new Standard vSwitch with vmk0 using my „old“ management IP address for ESXi.

On the multi-dimensional evolution of platforms and applications

I’d like to touch on a topic that I am seeing in several of my areas of interest right now. In general, it’s related a lot to the overall topic of „First, Second and Third Platform“ but I’d like to focus more on the individual implications for multiple domains. Over the course of the last few months, I have been involved in several discussions around different platforms and applications as well as their individual evolution and maturity. My personal observation is that both don’t necessarily evolve synchronously. Therefore, it is important to not only identify the phase that you are currently in but also to understand the operational implications of the „generation disconnect“ between app and platform.

 

Evolution of Platforms

As mentioned above, I’d like to relate my observations to the three platform generations below. I’d like to point out that these three generations are subdivided in several different technologies and can look and feel different in specific use-cases or fields of application. The common sense around the generations is:

platform_evolution

 

In addition to these phases, I see different „implementations“ of the respective platform generation. Take Client-Server as one example – this can be a physical-server-only model, this also stretches to server virtualization and potentially even to „VM-oriented“ hosting or even cloud services. My friend Massimo also wrote a nice piece on this.

 

Evolution of Applications

One of my key observations is that there is no simple 1:1 connection between applications and platforms. With the rise of 2nd generation platforms, not all applications from the 1st platform have been dropped and immediately been available for the next-generation platform. It’s actually an evolution for applications that are still business-relevant and therefore make sense to be optimized for the next-generation platform. And here comes the important observation: I believe there are (at least) three phases in an application evolution cycle that is happening for each platform generation – or potentially even in each concrete implementation of the platform generation. I’ll call theses phases „Unchanged“, „Optimized“ and „Purpose-built“ for now:

application_evolution

But how does that fit in the overall platform picture? I’ll try to merge the previous two pictures into one. It also shows a potential application evolution path across platform generations. As you can see, there can be a slight overlap between „purpose-built“ of the previous and the „unchanged“ phase of the next-generation platform.

evolution

But let’s move on to two concrete examples that I see applicable.

 

Example 1: Network Functions Virtualization

I’ll start with Network Functions Virtualization (NFV). NFV is a Telco industry movement that is supposed to provide hardware independence, new ways of agility, reduced time to market for carrier applications & services, cost reduction and much more – basically, it’s about the delivery of promises of Cloud Computing (and Third Platform) for the Telco industry (read more about it here). The famous architectural overview as described by ETSI can be seen below:

ETSI NFV

NFV differentiates between several functional components such as the actual platform (NFVI = Network Functions Virtualization Infrastructure), the application (VNF = Virtual Network Function), the application manager (VNF manager) and e.g. the orchestration engine.

So how could this look in reality? Let’s assume the VNF manager detects a certain usage pattern of its VNF and that VNF is reaching it’s potential maximum scale for the currently deployed amount of VNF instances. The VNF manager then talks to the Orchestrator that could then trigger e.g. the scale-out of the VNF by deploying additional worker instances on the underlying infrastructure/platform resources. The worker instances of the VNF could then automatically be included in the load distribution and have instant integration into necessary backend services where applicable. All of that happens via open APIs and standardized interfaces – it looks and feels a lot like a typical implementation example for a „third platform“ including the „purpose-built“ app.

Now into a quick reality check. ETSI’s initial NFV whitepaper is from October 2012. It basically describes the destination that the industry is aiming for. And while there might be some examples where VNFs, NFVI and Orchestration are already working hand in hand, there is still a lot of work to do. Some of the „NFV applications“ (or VNFs) might have been just „P2V“’ed (1:1 physical to virtual conversion) onto a virtualization platform and basically have the same configuration, same identity and are kept as close to its physical origins as possible. This allows a VNF provider/vendor to keep existing support procedures and organizations while offering their customers a „NFV 1.0 product“ that is providing some early benefits of NFV (hardware independence, faster time to market, …). But this also implies that you transfer some of the configurations that made perfect sense in the physical world over to the virtual world – where it only makes questionable sense. In this case, I’d actually talk about a move from a „purpose-built“ app from the first platform to an „unchanged“ app on the second platform. 

One example: one physical server in a telco application had 30*300GB harddisks, had 2*4Core CPUs and 128GB RAM. It never used more than 1TB of storage and average utilization has been below 4 CPUs and 32GB RAM. The „unchanged“ version of this app would be a 1:1 conversion with all (unnecessary) resource overhead provided in a virtual machine. The „optimized“ version of this app is a right-sized application (so only 1TB storage, 4 CPUs and 32GB RAM) that is also leveraging easy configuration files for installation as well as crash-consistent and persistent data management to allow backup & restore as VM. But a „purpose-built“ version of that app would leverage the underlying NFVI APIs, would allow scale-out deployment options based on actual demand as well as optimizations that are e.g. encryption at every layer of the application to ensure global deployment models even in the face of lawful interception relevance, etc.

 

Example 2: Microservices, Containers & Docker

My next example are microservices and their close friends containers. They are promising a new generation of application architecture and are drivers for the „3rd platform“ architecture. One of this movements famous poster-childs is Docker. Docker is a great (new) way to package and distribute applications in „containers“ that contain applications or just pieces of a larger application architecture. Newly developed applications usually follow a scale-out design, some might be written with something like the „12 factor app“ manifesto in mind (or the 15 factors according to Pivotal). Coming back to the pictures above: a 12 factor app could be considered „purpose-built“ for the „third platform“.

But how many applications have been built for this? There are many great examples for microservices-oriented applications by the „cloud-native“ companies such as Google, Amazon, Facebook and the likes. Adrian Cockcroft also gives inspirational talks about these topics around the globe. But I actually expect many applications to stay mainly unchanged as they are optimized for their current platform. At the same time, some of them might become available as (Docker) containers as part their next release. But again – if you look into the details, you’ll find the same application in a different wrapper. RAR is now ZIP (for my German readers: „Aus Raider wird nun Twix…“). But will these potentially „single-container-applications“ run well on a Cloud-Native/third platform architecture? They might not! To put it in a picture:

So in this case, it is actually important to understand these application limitations and expectations towards the platform (what about data persistence, security, platform resilience, networking, …) to make sure it runs smoothly in production. Coming back to Massimo’s blogpost – you can run your old Windows NT4 on a Public Cloud, but does it make sense?

Summary

Just like the continuous evolution of platforms that expose new characteristics and capabilities, there is also an ongoing evolution of applications. It is important to understand the key aspects of the application architecture and it’s deployment model before making a platform decision. The word „VNF“ does not necessary imply the alignment with NFV and the word „Docker“ does not automatically describe a Cloud-Native or microservices-oriented application.

 

Edits:

18.05.2016: added picture (containerizing legacy applications)

VMware Event „Enabling the Digital Enterprise“ am 10. und 11. Februar!

Am 10. und 11. Februar 2016 veranstaltet VMware ein großes Online-Event mit dem Titel „Enabling the Digital Enterprise“. Die Agenda verspricht viele spannende Themen und u.a. einige Neuigkeiten aus dem Bereich End-User Computing (VMware Horizon), der VMware Cloud Management Platform (vRealize Lösungen) sowie VMware Virtual SAN.

VMware CEO Pat Gelsinger wird sowohl von Sanjay Poonen (EVP & GM, End User Computing) und Raghu Raghuram (EVP, Software-Defined Data Center Division) in den jeweiligen Sessions begleitet. Ein kurzer Blick auf die Agenda:

Track 1 – Deliver and Secure Your Digital Workspace (Pat Gelsinger and Sanjay Poonen)

  • How to transform traditional IT culture, process, tools, and budgets by delivering and managing any app on any device from one platform
  • What’s new with the VMware Horizon portfolio
  • VMware’s new approach for managing your desktop and apps in the cloud

Track 2 – Build and Manage Your Hybrid Cloud (Pat Gelsinger and Raghu Raghuram)

  • How companies are implementing CMPs for intelligent operations, automated IT to IaaS, and DevOps-ready IT
  • VMware’s new streamlined product portfolio
  • Why companies are embracing HCI solutions powered by Virtual SAN

Registrierung: http://www.vmware.com/digitalenterprise

Kostenloses VMware Online Technology Forum 2015 am Mittwoch, 25.11.2015

onlinetechforumwebbanner900

Am 25.11. findet das kostenlose VMware Online Technology Forum von 10:00-14:30 CET statt!

Nach einer Keynote von Joe Baguley, VMware CTO für EMEA, gibt es zum einen diverse spannende Breakout Sessions tracks mit prominenten Speakern:

  • Software-Defined Data Center: Infrastructure (What’s new in vSphere, vRealize Operations Insight 6.1, EVO:RAIL 2.0, EVO SDDC, Virtual SAN, Virtual Volumes, Site Recovery Manager)
  • Software-Defined Data Center: New Services (vRealize Automation 7.0, VMware Integrated OpenStack 2.0, Cloud-Native Applications & Containers, vRealize Business Update, DevOps mit vRealize CodeStream)
  • Software-Defined Networking (NSX 6.2 Update, Network Functions Virtualization (NFV), Micro-Segmentation & NSX Security Partner Integrationen, Cross-Data Center NSX, NSX & vRealize Automation)
  • Hybrid Cloud (What’s new in vCloud Air Disaster Recovery, VMware Continuent replication for Oracle, Deep-Dive on vCloud Air Advanced Networking Services, …)
  • Business Mobility (AirWatch 8.1, VMware User Environment Manager Deep-Dive, VMware Horizon Flex, What’s new in Horizon (View) 6.2, Horizon Air, …)

Weiterhin gibt es noch Hands-On Labs und eine Expert Chat Zone.

Weiterführende Links:

VMworld Tipp: Cloud-Native Applications & vSphere Integrated Containers

Die VMworld 2015 in Barcelona steht unmittelbar bevor. Eine Teilnahme an der VMworld hat für mich schon immer bedeutet über den eigenen Tellerrand zu schauen und ein Gefühl für die aktuellen Trends und Themen der nächsten Jahre zu bekommen. Dazu gehört natürlich auch der persönliche Austausch mit Kollegen aus anderen Firmen, Sessions, Labs, Diskussionsrunden und vieles mehr.

Auf die Frage nach den interessantesten Sessions zum Thema „Zukunft“ verweise ich in den letzten Wochen gern auf alles, was sich im Cloud-Native Applications (CNA) Bereich abspielt: die Sessions mit dem Kürzel „CNA“, die Hands-On Labs zum Thema, etc. Warum ich das Thema besonders für vSphere Administratoren und Manager von Virtualsierungsteams als wichtig ansehe werde ich in einem separaten Blogpost erläutern.

Als Teaser und 3:30min Zusammenfassung eines der wichtigsten Announcements rund um Cloud-Native Applications von der VMworld 2015 in San Francisco möchte ich an dieser Stelle auf ein Video hinweisen – viel Spaß mit vSphere Integrated Containers:

MWC15: VMware vCloud for NFV with Integrated OpenStack

During Mobile World Congress 2015, VMware announced VMware vCloud for NFV with Integrated OpenStack (Link1 / Link2) – a new offering for Telcos to support their journey and success with Network Functions Virtualization (NFV). Core details from the Press Release:

 

  • VMware vCloud for NFV Helps CSPs Achieve Sustainable Cost Reductions, Improve Time To Market
  • VMware Offers CSPs a Fast, Simple Path to OpenStack Adoption
  • Multi-Vendor vCloud NFV Platform Supports 40+ Virtual Network Functions from 30+ Vendors

 

The offering is tailored to the needs for Telcos to run and manage a scalable horizontal NFV Infrastructure (NFVI). It will consist of VMware’s proven Software-Defined Datacenter components: vSphere, vRealize Operations, Virtual SAN, NSX, vCloud Director and vCloud API and it will also add VMware Integrated OpenStack (VIO).

vCloud NFV

To find out more about VMware’s announcements during Mobile World Congress, check out:

To find out more about NFV with VMware, check out the microsite http://www.vmware.com/go/NFV

The VMware vCloud for NFV is the only production proven, multi-vendor NFV cloud platform and supports over 40 Virtual Network Functions (VNFs) from over 30 ecosystem partners.

With the vCloud for NFV Platform, communication service providers (CSPs) can leverage VMware’s industry-leading cloud infrastructure for faster time to market of new and differentiated services while driving sustainable cost reductions through a cloud operations model.

The VMware vCloud for NFV features VMware vSphere®, the industry-defining compute virtualization solution for the cloud; VMware NSX™, the only network virtualization platform that delivers the entire networking and security model from L2-L7 in software; VMware Virtual SAN, software-defined storage that reduces storage CapEx and OpEx; and VMware vCloud Director, a management tool for Telco cloud architectures

By moving Network Functions Virtualization into production today, VMware customers are accelerating their transformation into next-generation cloud providers, building the operational expertise needed to succeed in the cloud era ahead of their competition.

This transformation is possible as a result of the deep multi-tenancy/multi-vendor capabilities of the VMware platform combined with highly developed operations support services that deliver FCAPS for the cloud and the open application programming interfaces (APIs) for integration northbound to applications and service orchestration platforms.

VMware Launch Event – Feb 2, 2015

What an exciting February! I just want to share some of my initial highlights from last nights launch of vSphere 6.0, Virtual SAN 6.0, VMware Integrated OpenStack and much more! There is so much content – make sure to check out https://www.vmware.com/now for additional broadcasts and materials. You will also find lots of tweets (#VMW28days) and amazing blogposts looking at the well-known blogs.

vSphere 6.0

 Virtual SAN 6.0

https://twitter.com/chuckhollis/status/562356325175525376

https://twitter.com/CaptainVSAN/status/562364473663836160

https://twitter.com/SanDiskDataCtr/status/562392713648017409

VMware Integrated OpenStack

Hands-On Labs Updates