vSphere with Tanzu

Direct Org Network to TKC Network Communication in Cloud Director 10.2

Since VMware has introduced vSphere with Tanzu support in VMware Cloud Director 10.2, I'm struggling to find a proper way to implement a solution that allows customers bidirectional communication between Virtual Machines and Pods. In earlier Kubernetes implementations using Container Service Extension (CSE) "Native Cluster", workers and the control plane were directly placed in Organization networks. Communication between Pods and Virtual Machines was quite easy, even if they were placed in different subnets because they could be routed through the Tier1 Gateway.

With Tanzu meeting VMware Cloud Director, Kubernetes Clusters have their own Tier1 Gateway. While it would be technically possible to implement routing between Tanzu and VCD Tier1s through Tier0, the typical Cloud Director Org Network is hidden behind a NAT. There is just no way to prevent overlapping networks when advertising Tier1 Routers to the upstream Tier0. The following diagram shows the VCD networking with Tanzu enabled.

With Cloud Director 10.2.2, VMware further optimized the implementation by automatically setting up Firewall Rules on the TKC Tier1 to only allow the tenants Org Networks to access Kubernetes services. They also published a guide on how customers could NAT their public IP addresses to TKC Ingress addressed to make them accessible from the Internet. The method is described here (see Publish Kubernetes Services using VCD Org Networks). Unfortunately, the need to communicate from Pods to Virtual Machines in VCD seems still not to be in VMware's scope.

While developing a decent solution by using Kubernetes Endpoints, I came up with a questionable workaround. While I highly doubt that these methods are supported and useful in production, I still want to share them, to show what actually could be possible.

Read More »Direct Org Network to TKC Network Communication in Cloud Director 10.2

Access Org Network Services from TKC Guest Cluster in VMware Cloud Director with Tanzu

Many applications running in container platforms still require external resources like databases. In the last article, I've explained how to access TKC resources from VMware Cloud Director Tenant Org Networks. In This article, I'm going to explain how to access a database running on a Virtual Machine in VMware Cloud Director from a Tanzu Kubernetes Cluster that was deployed using the latest Cloud Service Extension (CSE) in VMware Cloud Director 10.2.

If you are not familiar with the vSphere with Tanzu integration in VMware Cloud Director, the following diagram shows the communication. I have a single Org VCD that has a MySQL Server running in an Org network. When leaving the Org Network, the private IP address is translated (SNAT) to an public IP from the VCD external network (203.0.113.0/24). The Customer also has a Tanzu Kubernetes Cluster (TKC) deployed using VMware Cloud Director. This creates another Tier1 Gateway, which is connected to the same upstream Tier0 Router. When the TKC communicates, it is also translated on the Tier 1 using an address from the Egress Pool (10.99.200.0/24).

So, both Networks can not communicate with each other directly. As of VMware Cloud Director 10.2.2, communication is only implemented to work in one direction - Org Network -> TKC. This is done using automatically configuring a SNAT on the Org T1 to its primary public address. With this address, the Org Network can reach all Kubernetes services that are exposed using an address from the Ingress Pool, which is the default when exposing services in TKC.

Read More »Access Org Network Services from TKC Guest Cluster in VMware Cloud Director with Tanzu

VMware Cloud Director 10.2.2 and vSphere with Tanzu Enhancements

VMware Cloud Director 10.2.2 brings a couple of enhancements to the vSphere with Tanzu integration. While we are still waiting for VRF support in vSphere with Tanzu to fully separate Supervisor Namespaces, the implementation introduced in VCD 10.2.2 should be valid for production workloads.

This article explains new features and issues I had during the implementation:

  • VCD with Supervisor Control Plane communication
  • Tanzu Certificate Issues
  • Tanzu Kubernetes Cluster Tenant Network Isolation
  • Publish Kubernetes Services using VCD Org Networks

Read More »VMware Cloud Director 10.2.2 and vSphere with Tanzu Enhancements

vSphere with Tanzu 7.0U2a - TKC Deployment fails with VirtualMachineClassBindingNotFound

Since the latest update of vSphere with Tanzu to version 7.0 U2a, the deployment of Tanzu Kubernetes Clusters fails with the following condition:

  Conditions:
    Last Transition Time:  2021-05-05T18:19:10Z
    Message:               1 of 2 completed
    Reason:                VirtualMachineClassBindingNotFound @ Machine/tkc-dev-control-plane-wxd57
    Severity:              Error
    Status:                False
    Message:               0/1 Control Plane Node(s) healthy. 0/2 Worker Node(s) healthy
Events:
  Type    Reason        Age    From                                                                                             Message
  ----    ------        ----   ----                                                                                             -------
  Normal  PhaseChanged  7m22s  vmware-system-tkg/vmware-system-tkg-controller-manager/tanzukubernetescluster-status-controller  cluster changes from creating phase to failed phase

The problem seems to be related to the newly introduced VM Service. In previous versions, all Virtual Machine classes were automatically available for all namespaces. With the new VM Service, you can now create custom classes and assign them to namespaces. When a VirtualMachineClass is added to a namespace (Using the VM Service Card), a VirtualMachineClassBinding is created in the developer's namespace. This binding is not only required for Virtual Machines created by VM Service but also to deploy TKC Clusters.Read More »vSphere with Tanzu 7.0U2a - TKC Deployment fails with VirtualMachineClassBindingNotFound

How to Migrate SupervisorControlPlaneVM in vSphere with Tanzu

When you try to migrate the Control Plane of a Workload Management enabled vSphere 7 cluster using vMotion or Storage vMotion, the following warning is displayed:

"This option is not available because you do not have the required permissions."

This article explains why manual migrations of the SupervisorControlPlaneVM shouldn't be necessary in general and how to work around the limitation if you still want to migrate it manually.

Read More »How to Migrate SupervisorControlPlaneVM in vSphere with Tanzu

How to Create VM Service Templates in vSphere with Tanzu

When you try to deploy custom images using the VM Service in vSphere with Tanzu, the following error is displayed:

Error from server (GuestOS not supported for osType other3xLinux64Guest on image photon-hw11-4.0-1526e30ba0 or VMImage is not compatible with v1alpha1 or is not a TKG Image): error when creating "vmsvc-photon.yaml": admission webhook "default.validating.virtualmachine.vmoperator.vmware.com" denied the request: GuestOS not supported for osType other3xLinux64Guest on image photon-hw11-4.0-1526e30ba0 or VMImage is not compatible with v1alpha1 or is not a TKG Image

Only images provided by VMware in their Marketplace are supported to be deployed with the VM Operator. The reason for this limitation is that the template needs to be prepared to be used with OVF options and cloud-init. As of today, the only available Image is CentOS 8.

If you want to use your own images, the only hard requirement is that the Virtual Machine has to boot with DHCP and to access the machine, SSH needs to be enabled. In this article, I'm explaining how to change the official PhotonOS Image to be used with VM Service.

Read More »How to Create VM Service Templates in vSphere with Tanzu

Getting Started with vSphere with Tanzu - VM Service

With the release of vCenter 7.0 U2a, VMware has introduced VM Service. VM Service runs on top of vSphere with Tanzu and allows developers to deploy Virtual Machines using kubectl declarative object configuration. The underlying Kubernetes VM Operator was already available in previous versions, but the direct deployments of Virtual Machines was not supported. If you've deployed a TKC using the Tanzu Kubernetes Grid Service, it was already using the VM Operator.

In a previous article, I've explained how to deploy Virtual Machines using kubectl prior to the availability of VM Service. If you are aware of the method explained there, you are going to find a lot of similarities.

Read More »Getting Started with vSphere with Tanzu - VM Service

Quick Tip: kubectl vsphere login without entering a Password

With the release of vSphere 7.0 Update 2, a new version of the vSphere authentication plugin for kubectl has been released. The new plugin, which can be downloaded from the Supervisor Control Plane after enabling Workload Management, has a neat new feature that allows you to save the password in an environment variable.

Read More »Quick Tip: kubectl vsphere login without entering a Password

vSphere with Tanzu - SupervisorControlPlaneVM Excessive Disk WRITE IO

After deploying the latest version of VMware vSphere with Tanzu (vCenter Server 7.0 U1d / v1.18.2-vsc0.0.7-17449972), I noticed that the Virtual Machines running the Control Plane (SupervisorControlPlaneVM) had a constant disk write IO of 15 MB/s with over 3000 IOPS. This was something I didn't see in previous versions and as this is a completely new setup with no namespaces created yet, there must be an issue.

After troubleshooting the Supervisor Control Plane, it turned out that the problem was caused by fluent-bit, which is the Log processor used by Kubernetes. The log was constantly spammed with debugging messages. Reducing the log level solved the problem for me.

[Update: 2021-03-14 - The problem is not resolved in vSphere 7.0 Update 2]

Read More »vSphere with Tanzu - SupervisorControlPlaneVM Excessive Disk WRITE IO

Change TKG Cluster Service and Pod CIDR in Cloud Director 10.2

A major problem when deploying "vSphere with Tanzu" Clusters in VMware Cloud Director 10.2 is that the defaults for TKG Clusters are overlapping with the defaults for the Supervisor Cluster configured in vCenter Server during the Workload Management enablement.

When you deploy a Kubernetes Cluster using the new Container Extension in VCD 10.2, it deploys the cluster in a namespace on top of the Supervisor Cluster in the vCenter Server. The Supervisor Clusters IP address ranges for the Ingress CIDRs and Services CIDR must not overlap with IP addresses 10.96.0.0/12 and 192.168.0.0/16, which is the default for TKG Clusters. Unfortunately, 10.96.0.0 is also the default when enabling workload management so the deployment will fail when you stick to the defaults. The following error message is displayed when you have overlapping networks:

spec.settings.network.pods.cidrBlocks intersects with the network range of the external ip pools in network provider's configuration
spec.settings.network.pods.cidrBlocks intersects with the network range of the external ip pools LB in network provider's configuration

This article explains a workaround that you can apply when deleting and reconfiguring the Namespace Management with non-overlapping addresses is not an option.

Read More »Change TKG Cluster Service and Pod CIDR in Cloud Director 10.2