With the release of Cloud Director 10.3 and Container Service Extension 3.1 (CSE), you have an additional option to deploy Kubernetes Clusters: "Tanzu Kubernetes Grid Multi-Cloud" aka. TKGm. With TKGm you now have 4 options to offer Kubernetes as a Service for your customers.
- TKGm (Multi-Cloud)
- TKGs (vSphere with Tanzu)
- TKG-I (Enterprise PKS)
Yes, there is a reason why TKGm and TKGs are in bold letters. If you are starting today, forget about "Native" and "TKG-I". "TKGm" works similar to "Native" but is far superior. TKG-I (TKG Integrated Edition, formerly known as VMware Enterprise PKS) is deprecated as of CSE 3.1 and will be removed in future releases.
This article explains how to integrate CSE 3.1 in VMware Cloud Director 10.3.
Container Service Extension (CSE) is a VMware Cloud Director extension that allows tenants to create and work with Kubernetes clusters.
This guide explains how to prepare your Cloud Director to support TKGm Clusters. If you want to use the vSphere 7.0 integrated Tanzu Kubernetes, see this guide. The CSE Server is only required for Native and TKGm Clusters. TKGs Clusters (vSphere with Tanzu) can be managed with the VCD itself.
TKGm Clusters are deployed into the customer's Organization networks, similar to "Native" Clusters. You do not need vSphere with Tanzu to deploy TKGm.
If you've previously worked with CSE, there is good news. Since CSE 3.1 with VCD 10.3, you no longer have to maintain a RabbitMQ (AMQP) Server. CSE 3.1 uses MQTT, which runs directly on Cloud Director cells.
CSE is Python-based and can be installed on many Linux-based Operating Systems. It is very strict on python versions and requires version 3.7.3 or greater (3.7.x) but not 3.8! VMware Photon OS 3.0 Revision 3 uses Python 3.7 as default, so I'm using it for my CSE Server.
Check the Compatibility Matrix for CSE 3.1. During this article, I'm using the following versions:
- Cloud Director 10.3.1
- Container Service Extension 3.1.1
- NSX-T 184.108.40.206
- NSX-ALB 21.1.1
- vCenter 7.0 U3
Step 1 - Install and Configure Photon OS 3.0
As mentioned earlier, Photon OS 3.0 works great with CSE. Currently, CSE is not available as Appliance, so you have to download and install it yourself.
Photon OS Download Page > Photon OS 3.0 Rev 3 > OVA-hw11
Make sure to have the following OBA: photon-hw11-3.0-a383732.ova
Deploy it and configure a static IP address and FQDN (# hostnamectl set-hostname cse.virten.lab). It is not required to expose CSE to the Internet, so it should be placed on an internal management network, close to vCenter and Cloud Director. CSE talks directly to VCD and uses the MQTT to listen for orders.
If you want to automate the deployment, you can use cloud-init and PowerShell. I've deployed mine using this article, but installing it manually is also fine.
At the end of this Step, you should be able to log in on the Photon OS VM as root.
Step 2 - Prepare a CSE Seed Tenant in Cloud Director
CSE requires a Tenant in Cloud Director to install, store and share Templates. Go to VCD and create:
- Organization for CSE
- Organization VDC for CSE
- Edge Gateway
- Routed Network with Internet Access
- IP Pools and DNS configured on the Routed Network
I've deployed everything with Terraform. See terraform-examples/vcd-cse_tenant
Step 3 - Install and Configure CSE 3.1
Update all Photon Packages to the latest version.
tdnf update --assumeyes
Install required Packages.
tdnf install git python3-pip python3-devel build-essential --assumeyes
Create a user that will be used to run CSE. For obvious security reasons, you should not run CSE as root.
groupadd cse mkdir /opt useradd cse -g cse -m -d /opt/cse chmod 755 /opt
Change the user context to cse.
su - cse
Create and activate a virtual Python Environment.
python3 -m venv /opt/cse/python source /opt/cse/python/bin/activate
Install CSE using PIP. This will install all required dependencies including vcd-cli.
Note: The current stable version of CSE is available in the PIP repository. The second command allows you to install CSE directly from GIT. This might be useful for future releases that are not yet available. See vmware/container-service-extension/tags for a list of available versions.
pip3 install container-service-extension==3.1.1 #pip3 install git+https://firstname.lastname@example.org
Verify that cse has been installed correctly.
(python) cse [ ~ ]$ cse version CSE, Container Service Extension for VMware vCloud Director, version 3.1.1
Add the CSE extension to vcd-cli.
mkdir /opt/cse/.vcd-cli cat > /opt/cse/.vcd-cli/profiles.yaml << EOF extensions: - container_service_extension.client.cse EOF
Verify that the vcd-cli extension works.
(python) cse [ ~ ]$ vcd cse version CSE, Container Service Extension for VMware vCloud Director, version 3.1.1
Create the CSE Role in Cloud Director. Use your VCD hostname instead of vcloud.virten.lab. The -s option can be used when VCD used untrusted certificates. Enter VCD System Administrator credentials after you run the command.
cse create-service-role vcloud.virten.lab -s
Create a CSE Service User. When using vcd-cli commands, you have to log in and the session will timeout after some time. You might have to log in again through the guide to use vcd-cli commands. The command returns an error when you are no longer logged in. The -i option is required if you are using untrusted certificates.
vcd login vcloud.virten.lab system administrator -i vcd user create --enabled cse VMware1! "CSE Service Role"
CSE can use environment variables to store the config path and password. Add those to .bash_profil to automatically set them when you log in. This will automatically enable the Python Virtual Environment.
Note: the "CSE_TKG_M_ENABLED=True" variable is no longer required in CSE 3.1
cat >> /opt/cse/.bash_profile << EOF export CSE_CONFIG=/opt/cse/config/config.yaml export CSE_CONFIG_PASSWORD=VMware1! source /opt/cse/python/bin/activate EOF
Source the file to activate changes in the current session.
Get a configuration example and write it to a file.
mkdir /opt/cse/config cse sample > /opt/cse/config/config-decrypted.yaml
Edit config-decrypted.yaml to match your environment. The "enable_tkg_m: true" that was used in previous versions is no longer required. This is what my file looks like:
mqtt: verify_ssl: false vcd: host: vcloud.virten.lab log: true password: PASSWORD port: 443 username: cse verify: false vcs: - name: vcenter.virten.lab password: PASSWORD username: email@example.com verify: false service: enforce_authorization: false legacy_mode: false log_wire: false no_vc_communication_mode: false processors: 15 telemetry: enable: true broker: catalog: cse ip_allocation_mode: pool network: cse-builder org: cse remote_template_cookbook_url: https://raw.githubusercontent.com/vmware/container-service-extension-templates/master/template_v2.yaml storage_profile: 'StorageGold' vdc: cse
Encrypt the configuration. The CSE_CONFIG_PASSWORD environment variable is used.
cse encrypt /opt/cse/config/config-decrypted.yaml --output /opt/cse/config/config.yaml chmod 600 /opt/cse/config/config.yaml
Verify the configuration
(python) cse [ ~ ]$ cse check Required Python version: >= 3.7.3 Installed Python version: 3.7.5 (default, Oct 25 2021, 23:57:02) [GCC 7.3.0] Decrypting '/opt/cse/config/config.yaml' Validating config file '/opt/cse/config/config.yaml' InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. Connected to vCloud Director (vcloud.virten.lab:443) Connected to vCenter Server 'vcenter.virten.lab' as 'firstname.lastname@example.org' (vcenter.virten.lab) Config file '/opt/cse/config/config.yaml' is valid
Add your SSH public key to the user. This will allow you to directly login, and you can copy them to TKG images for troubleshooting reasons.
mkdir -p ~/.ssh cat >> ~/.ssh/authorized_keys << EOF ssh-rsa AAAAB3NzaC[...] fgr EOF
With this configuration, CSE is able to provide "Native" Images. Now you have to import TKGm Images, which need to be downloaded from the VMware website. To do so, you have to go to vmware.com > Resources > Product Downloads > Tanzu Kubernetes Grid. Please make sure to download the correct images. According to the documentation, only "Ubuntu 20.04 based" TKG Images are supported.
Direct Link: VMware Tanzu Kubernetes Grid 1.4.0 Download
You should have one (or more) of the following images:
Copy the .ova files to the CSE Server and import them using the cse template import command:
cse template import -F ~/ubuntu-2004-kube-v1.21.2+vmware.1-tkg.1-7832907791984498322.ova cse template import -F ~/ubuntu-2004-kube-v1.19.12+vmware.1-tkg.1-15841320193950299489.ova cse template import -F ~/ubuntu-2004-kube-v1.20.8+vmware.1-tkg.1-17589475007677388652.ova
Now you can start the installation. This command will take about 10 to 30 minutes as it needs to download images, configure Kubernetes and put them into the Cloud Director catalog.
cse install -k ~/.ssh/authorized_keys
When the installation is finished, start the CSE server in interactive mode to see if it can start successfully. The output should look like this:
(python) cse [ ~ ]$ cse run Required Python version: >= 3.7.3 Installed Python version: 3.7.5 (default, Oct 25 2021, 23:57:02) [GCC 7.3.0] CSE version: 3.1.1 Decrypting '/opt/cse/config/config.yaml' Validating config file '/opt/cse/config/config.yaml' InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. Connected to vCloud Director (vcloud.virten.lab:443) Connected to vCenter Server 'vcenter.virten.lab' as 'email@example.com' (vcenter.virten.lab) Config file '/opt/cse/config/config.yaml' is valid Using RDE version: 2.0.0 Successfully loaded defined entity schema urn:vcloud:type:cse:nativeCluster:2.0.0 to global context Loading k8s template definition from catalog Found K8 template 'photon-v2_k8-1.14_weave-2.5.2' at revision 4 in catalog 'cse' Found K8 template 'ubuntu-16.04_k8-1.18_weave-2.6.5' at revision 2 in catalog 'cse' Found K8 template 'ubuntu-16.04_k8-1.19_weave-2.6.5' at revision 2 in catalog 'cse' Found K8 template 'ubuntu-16.04_k8-1.20_weave-2.6.5' at revision 2 in catalog 'cse' Found K8 template 'ubuntu-16.04_k8-1.21_weave-2.8.1' at revision 1 in catalog 'cse' Loading TKGm template definition from catalog Found TKGm template ubuntu-2004-kube-v1.20.8+vmware.1-tkg.1-17589475007677388652. Loading kubernetes runtime placement policies. Template rules are not supported by CSE for vCD api version 35.0 or above. Skipping template rule processing. Validating CSE installation according to config file MQTT extension and API filters found Found catalog 'cse' CSE installation is valid Started thread 'MessageConsumer' (140098913969920) Started thread 'ConsumerWatchdog' (140098905577216) Container Service Extension for vCloud Director Server running using config file: /opt/cse/config/config.yaml Log files: /opt/cse/.cse-logs/cse-server-info.log, /opt/cse/.cse-logs/cse-server-debug.log waiting for requests (ctrl+c to close)
Press CTRL+C to stop the server. Now we will make it run as a service. This requires you to write a startup script and a systemd unit file. You can use the official examples of cse.sh and cse.service. Here are the files that I use.
cat > /opt/cse/cse.sh << EOF #!/usr/bin/env bash CSE_VENV_PATH=/opt/cse source $CSE_VENV_PATH/python/bin/activate export CSE_CONFIG_PASSWORD='VMware1!' export CSE_CONFIG=$CSE_VENV_PATH/config/config.yaml cse run EOF
Make the file executable
chmod +x /opt/cse/cse.sh
The Unit file needs to be created as root.
cat > /etc/systemd/system/cse.service << EOF [Unit] Description=Container Service Extension for VMware vCloud Director Wants=network-online.target After=network-online.target [Service] ExecStart=/opt/cse/cse.sh User=cse WorkingDirectory=/opt/cse Type=simple Restart=always [Install] WantedBy=default.target EOF
Enable and Start the service
root@cse [ ~ ]# systemctl enable cse.service root@cse [ ~ ]# systemctl start cse.service
Verify that the service is running.
root@cse [ ~ ]# systemctl status cse.service
That's it for the CSE Server. You can keep the SSH session open to check log files for debugging:
- /opt/cse/.cse-logs/cse-server-debug.log <- Your holy grail for troubleshooting CSE issues
Step 4 - Cloud Director Configuration to enable Tenants for TKGm
Publish the Container UI Plugin to Tenants
Login to Cloud Director as the System Administrator navigate to More > Customize Portal, select Container UI Plugin and press PUBLISH. You can either enable all tenants or select which tenants you want to enable for CSE.
I've only selected a single tenant.
Now, when you log in as the tenant, you should see the plugin within the More tab.
Edit and Publish the cse:nativeCluster Entitlement
To allow tenants to create TKGm clusters, you have to publish the "cse:nativeCluster Entitlement" Rights Bundle. With the bundle itself, they can only deploy TKGm clusters, not "Native" Clusters, which is fine for me. To enable Native Clusters you additionally have to explicitly enable the oVDC using the vcd-cli tool. I don't do it here but you can see how it works here.
Prior to publishing the bundle, you have to add a few rights. This is new in CSE 3.1 and documented here. Navigate to Administration > Tenant Access Control > Rights Bundles, select cse:nativeCluster Entitlement and press EDIT.
Make sure that the following rights are enabled:
- LIBRARIES > Catalog > View > View Shared Catalogs from Other Organizations
- ACCESS CONTROL > User > Manage > Manage user's own API token
- NETWORKING > Gateway > View > View Gateway
- NETWORKING > Gateway Services > View > NAT View Only
- NETWORKING > Gateway Services > Manage > NAT Configure
- NETWORKING > Gateway Services > View > Load Balancer View Only
- NETWORKING > Gateway Services > Manage > Load Balancer Configure
- COMPUTE > Organization VDC > Manage > Create a Shared Disk
Press SAVE. Then mark the Rights Bundle again, click PUBLISH and publish the bundle to all or selective tenants.
With the rights bundle, tenants are able to create their own role that allows them to create TKGm clusters. This is quite uncomfortable, so I edit the Global Organization Administrator to include the right. Navigate to Administration > Tenant Access Control > Global Roles, select the Organization Administrator and press EDIT. Scroll down to OTHER, activate View and Manage and press SAVE.
Note: Only the CSE:NATIVECLUSTER rights are actually required for TKGm. I think having all rights active in the Role and further specifying permissions using Rights Bundles is more convenient.
Thats' it! Your tenants can now start and deploy TKGm!
Step 5 - Deploy a TKGm Kubernetes Cluster as Tenant
Login as a Tenant and navigate to More > Kubernetes Container Clusters, press NEW and follow the Wizard.
Note: Do NOT disable "Allow external traffic to be routed to this cluster", this feature is required by TKGm!
When the cluster has been deployed you can download the kubeconfig.
With this configuration, you can login using kubectl. Please be aware that with the default configuration, the Kubernetes apiserver is exposed to the internet. You might want to create Edge Gateway Firewall rules to limit the access.
# export KUBECONFIG=k8s-fgr # kubectl get nodes NAME STATUS ROLES AGE VERSION mstr-sqpd Ready control-plane,master 161m v1.20.5+vmware.2 node-knp3 Ready 156m v1.20.5+vmware.2 node-zrce Ready 157m v1.20.5+vmware.2 # kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system antrea-agent-47j2t 2/2 Running 0 94m kube-system antrea-agent-d7dms 2/2 Running 0 91m kube-system antrea-agent-hslbg 2/2 Running 0 89m kube-system antrea-controller-5cd95c574d-5z2np 1/1 Running 0 94m kube-system coredns-6598d898cd-9z55d 1/1 Running 0 94m kube-system coredns-6598d898cd-gf5qc 1/1 Running 0 94m kube-system csi-vcd-controllerplugin-0 3/3 Running 0 94m kube-system csi-vcd-nodeplugin-fwfz2 2/2 Running 0 90m kube-system csi-vcd-nodeplugin-xjz9l 2/2 Running 0 89m kube-system etcd-mstr-tfgn 1/1 Running 0 94m kube-system kube-apiserver-mstr-tfgn 1/1 Running 0 94m kube-system kube-controller-manager-mstr-tfgn 1/1 Running 0 94m kube-system kube-proxy-8hbvz 1/1 Running 0 89m kube-system kube-proxy-lsqxv 1/1 Running 0 94m kube-system kube-proxy-r4twn 1/1 Running 0 91m kube-system kube-scheduler-mstr-tfgn 1/1 Running 0 94m kube-system vmware-cloud-director-ccm-5489b6788c 1/1 Running 0 94m
All Pods should be up and running. If not, there might be an Issue with TKG to VCD communication. In beta releases, you had to manually configure VCD credentials at this point. This is no longer required in the final product. You automatically have a token in the vcloud-basic-auth secret:
# kubectl get secret -n kube-system vcloud-basic-auth -o yaml apiVersion: v1 kind: Secret data: password: "" refreshToken: NlVtRVNpVjFkNVhxNDBMS2JNcDBZa0xRbU4zMnpIdXg= username: ""