#vK8s 2021 edition – friends don’t let friends run Kubernetes on bare-metal

Three years ago, I wrote a blogpost on why you wouldn’t want to run Kubernetes on bare-metal. VMware released a number of platform enhancements over these years and there is a lot of updated material and feedback – also coming from customers. So what are (my personal) reasons to run containers and Kubernetes (short “K8s”) on a virtual infrastructure & vSphere in particular?

Operations: Running multiple clusters on bare-metal is hard

  • Multiple clusters in a virtual environment are a lot easier and each cluster can leverage e.g. it‘s own lifecycle policies (e.g. for K8s version upgrades) instead of forcing one bare-metal cluster to upgrade. Running multiple Kubernetes versions side-by-side might be already or become a requirement in the near future.
  • It also makes lots of sense to run Kubernetes side-by-side with your existing VMs instead of building a new hardware silo and operational complexity
  • VMware’s compute platform vSphere is the de-facto standard for datacenter workloads in companies across industries and operational experience and resources are available across the globe. Bare-metal operations typically introduces new risks and operational complexity.

Availability/Resilience and Quality of service: you can plan for failures without compromising density

  • Virtual K8s clusters could benefit even in „two physical datacenter” scenarios where the underlying infrastructure is spread across both sites. A “stretched” platform (e.g. vSphere with vSAN Stretched Cluster) allows you to run logical three-node Kubernetes control planes in VMs and protect the control plane and workload nodes using vSphere HA.
  • vSphere also allows you to prioritize workloads by configuring policies (networking, storage, compute, memory) that will also be enforced during outages (Network I/O Control, Storage I/O Control, Resource Pools, Reservations, Limits, HA Restart Priorities, …)
    • Restart a failed or problematic Kubernetes node VM before Kubernetes itself even detects a problem.
    • Provide the Kubernetes control plane availability by utilizing mature heartbeat and partition detection mechanisms in vSphere to monitor servers, Kubernetes VMs, and network connectivity to enable quick recovery.
    • Prevent service disruption and performance impacts through proactive failure detection, live migration (vMotion) of VMs, automatic load balancing, restart-due-to-infrastructure failures, and highly available storage

Resource fragmentation, overhead & capacity management: single-purpose usage of hardware resources vs. multi-purpose platform

  • Running Kubernetes clusters virtually and using VMware DRS to balance these clusters across vSphere hosts allows the deployment of multiple K8s cluster on the same hardware setup and increasing utilization of hardware resources
  • When running multiple K8s clusters on dedicated bare-metal hosts, you lose the overall capability to utilize hardware resources across the infrastructure pool
    • Many environments won‘t be able to (quickly) repurpose existing capacity from one bare-metal host in one cluster to another cluster in a short timeframe
  • From a vSphere perspective, Kubernetes is yet another set of VMs and capacity management can be done across multiple Kubernetes clusters; it gets more efficient the more clusters you run
    • Deep integrations with existing operational tools like vRealize Operations allow operational teams to deliver Kubernetes with confidence
  • K8s is only a Day-1 scheduler and does not perform resource balancing based on running pods
    • In case of imbalance on the vSphere layer, vSphere DRS rebalances K8s node VMs across the physical estate to better utilize the underlying cluster and delivers best-of-both-worlds from a scheduling perspective
  • High availability and „stand-by“ systems are cost intensive in bare-metal deployments, especially in edge scenarios: in order to provide some level of redundancy, some spare physical hardware capacity (servers) need to be available. In worst case you need to reserve capacity per cluster which increases physical overhead (CAPEX and OPEX) per cluster.
    • vSphere allows you to share failover capacity incl. incl strict admission control to protect important workloads across Kubernetes clusters because the VMs can be restarted and reprioritized e.g. based on the scope of a failure

Single point of integration with the underlying infrastructure

  • A programmable, Software-Defined Datacenter: Infrastructure as Code allows to automate all the things on an API-driven datacenter stack
  • Persistent storage integration would need to be done for each underlying storage architecture individually, running K8s on vSphere allows to leverage already abstracted and virtualized storage devices
  • Monitoring of hardware components is specific to individual hardware choices, vSphere offers an abstracted way of monitoring across different hardware generations and vendors

Security & Isolation

  • vSphere delivers hardware-level isolation at the Kubernetes cluster, namespace, and even pod level
  • VMware infrastructure also enables the pattern of many smaller Kubernetes clusters, providing true multi-tenant isolation with a reduced fault domain. Smaller clusters reduce the blast radius, i.e. any problem with one cluster only affects the pods in that small cluster and won’t impact the broader environment.
  • In addition, smaller clusters mean each developer or environment (test, staging, production) can have their own cluster, allowing them to install their own CRDs or operators without risk of adversely affecting other teams.

Credits and further reading

#vK8s – friends don’t let friends run Kubernetes on bare-metal

So, no matter what your favorite Kubernetes framework is these days – I am convinced it runs best on a virtual infrastructure and of course even better on vSphere. Friends don’t let friends run Kubernetes on bare-metal. And what hashtag could summarize this better than something short and crips like #vK8s ? I liked this idea so much that I created some “RUN vK8s” images (inspired by my colleagues Frank Denneman and Duncan Epping – guys, it’s been NINE years since RUN DRS!) that I want to share with all of you. You can find the repository on GitHub – feel free to use them whereever you like. 

VMware Project Pacific – collection of materials

Blogposts:

VMworld US 2019:

VMworld Europe 2019 sessions:

  • HBI1452BE – Project Pacific: Supervisor Cluster Deep Dive – STREAM DOWNLOAD
  • HBI1761BE – Project Pacific 101: The Future of vSphere – STREAM DOWNLOAD
  • HBI4500BE – Project Pacific: Guest Clusters Deep Dive – STREAM DOWNLOAD
  • HBI4501BE – Project Pacific: Native Pods Deep Dive – STREAM DOWNLOAD
  • HBI4937BE – Introducing Project Pacific: Transforming vSphere into the App Platform of the Future – STREAM DOWNLOAD
  • KUB1840BE – Run Kubernetes Consistently Across Clouds with Tanzu & Project Pacific – STREAM DOWNLOAD
  • KUB1851BE – Managing Clusters: Project Pacific on vSphere & Tanzu Mission ControlSTREAM DOWNLOAD

Podcasts:

Labs / Hands-On:

  • HOL-2013-01-SDC – Project Pacific – Lightning Lab: https://labs.hol.vmware.com/HOL/catalogs/lab/6877

Other interesting sources:

Feel free to reach out if you are missing any interesting sessions here – happy to update this post anytime! @bbrundert

Deploying kubeapps helm chart on VMware Enterprise PKS (lab deployment!)

With the recent announcement of VMware and Bitnami joining forces, I wanted to revisit the kubeapps project on Enterprise PKS earlier today. I followed the community documentation but ran into some smaller issues (see my GitHub comments here) that were coming up in the MongoDB deployment initially.

UPDATE: At first I thought you needed to enable privileged containers in PKS but actually you don’t have to do that! There was a typo in my configuration which led to an unknown flag for the MongoDB deployment. I used the flag “mongodb.securityContext.enable=false” when deploying the Helm chart but it should have been “mongodb.securityContext.enabled=false”. Thanks to Andres from the Bitnami team for catching this! The instructions below have been updated!

Install Helm

Add the bitnami repo:

helm repo add bitnami https://charts.bitnami.com/bitnami

Add a “kubeapps” namespace to deploy into

kubectl create namespace kubeapps

Add a Service Account to Tiller

vi rbac-config-tiller.yaml
---
apiVersion: v1
 kind: ServiceAccount
 metadata:
   name: tiller
   namespace: kube-system
 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRoleBinding
 metadata:
   name: tiller
 roleRef:
   apiGroup: rbac.authorization.k8s.io
   kind: ClusterRole
   name: cluster-admin
 subjects:
 kind: ServiceAccount
 name: tiller
 namespace: kube-system 
---
kubectl create -f rbac-config-tiller.yaml

Leverage newly created service account for Tiller:

helm init --service-account tiller

Create Service account for kubeapps-operator

kubectl create serviceaccount kubeapps-operator 

kubectl create clusterrolebinding kubeapps-operator \
--clusterrole=cluster-admin \
--serviceaccount=default:kubeapps-operator

kubectl get secret $(kubectl get serviceaccount kubeapps-operator -o jsonpath='{.secrets[].name}') -o jsonpath='{.data.token}' | base64 --decode

Copy the secret for use in the kubeapps dashboard later on.

Since NSX-T brings an out-of-the-box capability for exposing kubeapps to an external IP address, we can use LoadBalancer and skip the port-forwarding section of the documentation. Following what I found in another bug, I set some extra flags for disabling IPv6:

helm install --name kubeapps --namespace kubeapps bitnami/kubeapps \
--set frontend.service.type=LoadBalancer \
--set mongodb.securityContext.enabled=false \
--set mongodb.mongodbEnableIPv6=false

After a few minutes, the deployed services & deployments should be up and running:

Follow then part three of the instructions to access the dashboard.

2019-05-13 – Cloud Native Short Takes

Hello everyone and welcome to my first Cloud Native Short Take. Following the spirit from my previous efforts, I’d like to share some interesting links and observations that I came across recently. So, lets get right into it:

  • Red Hat Summit carried some interesting updates for customers that run OpenShift on VMware today or plan to do it in the future. There was a joint announcement of a reference architecture for OpenShift on the VMware SDDC. Read more about it on the VMware Office of the CTO Blog, the VMware vSphere Blog as well as the Red Hat Blog.
  • Speaking of announcements, GitHub just announced “GitHub Package Registry” – a new service that will users allow to bring their packages right to their code. As GitHub puts it: “GitHub Package Registry is a software package hosting service, similar to npmjs.org, rubygems.org, or hub.docker.com, that allows you to host your packages and code in one place. “
  • My friends at Wavefront launched a new capability around observability in microservices land. Check out their blogpost around Service Maps in their Wavefront 3D Observability offering that combines metrics, distributed tracing and histograms. There is also a pretty cool demo on Youtube linked from that post – it’s beautiful!
  • Following the motto “Kubernetes, PKS, and Cloud Automation Services – Better Together!”, the VMware Cloud Automation Services team released a beta integration with Enterprise PKS. Read more about it on their blog and watch the webinar for more details.
  • My friend Cormac is a fantastic resource in all-things cloud-native storage these days. And thankfully, he shares lots of his own discoveries on his blog. His latest post is focused on testing Portworx’ STORK for doing K8s volume snapshots in an on-prem vSphere environment. Read more about it here. Looking forward to the next post which will include some integration testing with Velero.
  • Speaking of Velero (formerly known as Ark), this project is heading to a version 1.0 release! I am very excited for the team! You can find the first Release Candidate here.
  • And coming back to Cormac’s blog – he just released a “Getting started with Velero 1.0-RC1” blogpost with his test deployment running Cassandra on PKS on vSphere (leveraging Restic).
  • The Kubernetes 1.15 enhancement tracking is now locked down. You can find the document on Google Docs
  • I came across an interesting talk on InfoQ titled “The Life of a Packet through Istio”
  • Another interesting announcement came from Red Hat and Microsoft around a project called KEDA. KEDA “allows for fine grained autoscaling (including to/from zero) for event driven Kubernetes workloads. KEDA serves as a Kubernetes Metrics Server and allows users to define autoscaling rules using a dedicated Kubernetes custom resource definition”. A very interesting project, check out the blogpost and a TGIK episode from Kris Nóva last Friday.
  • There is some useful material around the Certified Kubernetes Administrator exam in this little study guide
  • Oh and speaking of enablement: I can only recommend you check out the freshly published book “Cloud Native Patterns” by the amazing Cornelia Davis on Manning.com. I have been following the development of that book via the “MEAP” program and it’s a pretty great source of information!
  • Several thoughts on choosing the right Serverless Platform

Operationalizing VMware PKS with BOSH – how to get started

I have installed VMware PKS in a variety of environments and I typically show something that helps Platform Operators running PKS to dive even deeper into the status of PKS components beyond the pks cli. One of the key lifecycle components in PKS is called BOSH. BOSH deploys Kubernetes masters and workers and performs a number of other tasks. So how do you get access to BOSH in the easiest way?

Step 1)

  • Login to the Ops Manager VM via ssh: ssh ubuntu@your.opsmanager.com  

Step 2)

  • Open Ops Manager and click on the BOSH Director tile: 
  • Click on the “Credentials” Tab and search for “BOSH Commandline Credentials”:
  • You will see an output similar like this one:
    {"credential":"BOSH_CLIENT=ops_manager 
    BOSH_CLIENT_SECRET=ABCDEFGhijklmnopQRSTUVWxyz 
    BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate 
    BOSH_ENVIRONMENT=192.168.1.100 bosh "}
  • Copy and paste that line and reformat it the following way:
    BOSH_CLIENT=ops_manager 
    BOSH_CLIENT_SECRET=ABCDEFGhijklmnopQRSTUVWxyz 
    BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate 
    BOSH_ENVIRONMENT=192.168.1.100

     

  • Easiest way to get started every time is to make it part of your .bashrc configuration by doing the following:
    • edit your .bashrc and append the outputs from above like this:
    • export BOSH_CLIENT=ops_manager 
      export BOSH_CLIENT_SECRET=ABCDEFGhijklmnopQRSTUVWxyz 
      export BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate 
      export BOSH_ENVIRONMENT=192.168.1.100
    • logout and login again (or just run the export commands on the CLI manually once)

 

  • Some example commands on how to interact with BOSH (and a nice cheat sheet at https://github.com/DennyZhang/cheatsheet-bosh-A4):
  • bosh deployments
    PKS=$(bosh deployments | grep ^pivotal | awk '{print $1;}')
    bosh -d $PKS vms
    bosh -d $PKS instances
    bosh -d $PKS tasks
    bosh -d $PKS tasks -ar
    bosh -d $PKS task 724
    bosh -d $PKS task 724 --debug
    
    CLUSTER=$(bosh deployments | grep ^service-instance | awk '{print $1;}')
    bosh -d $CLUSTER vms
    bosh -d $CLUSTER vms --vitals
    bosh -d $CLUSTER tasks --recent=9
    bosh -d $CLUSTER task 2009 --debug
    bosh -d $CLUSTER ssh master/0
    bosh -d $CLUSTER ssh worker/0
    bosh -d $CLUSTER logs
    bosh -d $CLUSTER cloud-check

     

  • Advanced users: you can also install the BOSH CLI on and admin VM and run from there:
    • Download from https://github.com/cloudfoundry/bosh-cli/releases
    • Copy the certificate from the Ops Manager VM (/var/tempest/workspaces/default/root_ca_certificate) to your admin VM and edit the .bashrc environment variables accordingly