Skip to content

Troubleshooting

Regenerate Standalone ESXi Host Certificate

On a freshly installed ESXi host, the following error is displayed:

The certificate assigned to this host is not valid yet. You should install a valid certificate.

The issue is caused by a system time that is set to the future during ESXi installation. Having not configured the correct time can also cause issues when trying to add the ESXi host to vCenter Server. To solve the issue, set the correct time (Best practice is to use an NTP server) and regenerate the certificate.

Read More »Regenerate Standalone ESXi Host Certificate

Installation or Removal of VIB Packages in ESXi 7.0 fails with Error: Failed to query file system stats:

While installing ESXi updates, I noticed that on one of my hosts, the installation or removal of VIB packages fails with the following error message

# esxcli software vib install -d [package]
# esxcli software vib remove -n [package]
[InstallationError]
Failed to query file system stats: Errors:
Error getting data for filesystem on '/vmfs/volumes/59a83d9c-628c6ae0-7b35-f44d306ec05a': Cannot open volume: /vmfs/volumes/59a83d9c-628c6ae0-7b35-f44d306ec05a, skipping.
cause = Errors:
Error getting data for filesystem on '/vmfs/volumes/59a83d9c-628c6ae0-7b35-f44d306ec05a': Cannot open volume: /vmfs/volumes/59a83d9c-628c6ae0-7b35-f44d306ec05a, skipping.
Please refer to the log file for more details.

The device 59a83d9c-628c6ae0-7b35-f44d306ec05a was a non existing volume, referenced by a vffs mount. VFFS (Virtual Flash File System) was used in earlier ESXi releases by vSphere Flash Read Cache. I'm not sure where that comes from but this is how you can remove the stale mount:

Read More »Installation or Removal of VIB Packages in ESXi 7.0 fails with Error: Failed to query file system stats:

Troubleshooting CSE 3.1 TKGm Integration with VMware Cloud Director 10.3

This article recaps Issues that I had during the integration of VMware Container Service Extension 3.1 to allow the deployment of Tanzu Kubernetes Grid Clusters (TKGm) in VMware Cloud Director 10.3.

If you are interested in an Implementation Guide, refer to Deploy CSE 3.1 with TKGm Support in VCD 10.3 and First Steps with TKGm Guest Clusters in VCD 10.3.

  • CSE Log File Location
  • DNS Issues during Photon Image Creation
  • Disable rollbackOnFailure to troubleshoot TKGm deployment errors
  • Template cookbook version 1.0.0 is incompatible with CSE running in non-legacy mode
  • https://[IP-ADDRESS] should have a https scheme and match CSE server config file
  • 403 Client Error: Forbidden for url: https://[VCD]/oauth/tenant/demo/register
  • NodeCreationError: failure on creating nodes ['mstr-xxxx']
  • Force Delete TKGm Clusters / Can't delete TKGm Cluster / Delete Stuck in DELETE:IN_PROGRESS

Read More »Troubleshooting CSE 3.1 TKGm Integration with VMware Cloud Director 10.3

Deploy NSX-T Edge VM SSH Keys with Ansible

While working with NSX-T, there are many reasons to access edge appliances using SSH. Most troubleshooting options are only available using nsxcli on the appliance itself. During the deployment, each appliance has 3 user account: root, admin, and audit. Alle Accounts are configured with password-based authentication. In a previous article, I've already described how to deploy SSH Keys using nsxcli, which allows a secure and comfortable authentication method. In this article, I'm explaining how to use ansible to deploy SSH public keys to NSX-T Edges. This option allows you to easily manage keys on a large platform.

Read More »Deploy NSX-T Edge VM SSH Keys with Ansible

Error when connecting Virtual Machine to NSX-T Segments

When you try to connect an NSX-T based Segment to a virtual machine, the task fails with the following error message:

Reconfigure virtual machine - An error occurred during host configuration

In the nsx logfile on the ESXi host where the VM is located, the following error is displayed:

/var/log/nsx-syslog.log
2021-03-13T19:00:36Z nsx-opsagent[527252]: NSX 527252 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="527596" level="ERROR" errorCode="MPA44211"] [PortOp] Failed to create port 780b915d-1479-4eed-8e29-2364d9563f95 with VIF f3f605f2-38a1-4263-bbbd-81b189077f69 because DVS id is not found by transport-zone id 1b3a2f36-bfd1-443e-a0f6-4de01abc963e
2021-03-13T19:00:36Z nsx-opsagent[527252]: NSX 527252 - [nsx@6876 comp="nsx-esx" subcomp="opsagent" s2comp="nsxa" tid="527596" level="ERROR" errorCode="MPA42001"] [CreateLocalDvPort] createPort(uuid=780b915d-1479-4eed-8e29-2364d9563f95, zone=1b3a2f36-bfd1-443e-a0f6-4de01abc963e) failed: Failed to create port 780b915d-1479-4eed-8e29-2364d9563f95 with VIF f3f605f2-38a1-4263-bbbd-81b189077f69 because DVS id is not found by transport-zone id 1b3a2f36-bfd1-443e-a0f6-4de01abc963e

 

Read More »Error when connecting Virtual Machine to NSX-T Segments

vSphere with Tanzu - SupervisorControlPlaneVM Excessive Disk WRITE IO

After deploying the latest version of VMware vSphere with Tanzu (vCenter Server 7.0 U1d / v1.18.2-vsc0.0.7-17449972), I noticed that the Virtual Machines running the Control Plane (SupervisorControlPlaneVM) had a constant disk write IO of 15 MB/s with over 3000 IOPS. This was something I didn't see in previous versions and as this is a completely new setup with no namespaces created yet, there must be an issue.

After troubleshooting the Supervisor Control Plane, it turned out that the problem was caused by fluent-bit, which is the Log processor used by Kubernetes. The log was constantly spammed with debugging messages. Reducing the log level solved the problem for me.

[Update: 2021-03-14 - The problem is not resolved in vSphere 7.0 Update 2]

Read More »vSphere with Tanzu - SupervisorControlPlaneVM Excessive Disk WRITE IO

Heads Up: VMFS6 Heap Exhaustion in ESXi 7.0

In ESXi 7.0 (Build 15843807) and 7.0b (Build 16324942), there is a known issue with the VMFS6 filesystem. The problem is solved in ESXi 7.0 Update 1. In certain workflows, memory is not freed correctly resulting in VMFS heap exhaustion. You might be affected when your system shows the following symptoms:

  • Datastores are showing "Not consumed" on hosts
  • Virtual Machines fail to vMotion
  • Virtual Machines become orphaned when powered off
  • Snapshot creation fails with "An error occurred while saving the snapshot: Error."

In the vmkernel.log, you see the following error messages:

  • Heap vmfs3 already at its maximum size. Cannot expand
  • Heap vmfs3: Maximum allowed growth (#) too small for size (#)
  • Failed to initialize VMFS distributed locking on volume #: Out of memory
  • Failed to get object 28 type 1 uuid # FD 0 gen 0: Out of memory

Read More »Heads Up: VMFS6 Heap Exhaustion in ESXi 7.0

Quick Tip: Reset Tanzu SupervisorControlPlaneVM Alarms

When you are working with the Kubernetes Integration in vSphere 7.0, you might come into the situation where the SupervisorControlPlaneVM has an active alarm. Those Virtual Machines are deployed and controlled by the WCP Agent and even as an Administrator, you are not allowed to touch those objects.
You can't power then off, reboot, or migrate them using vMotion. The problem is that you can't even clear alarms. One alarm I recently had was the "vSphere HA virtual machine failover failed" alarm, which you usually see when the ESXi hostd crashed, but the Virtual Machines are still running.Read More »Quick Tip: Reset Tanzu SupervisorControlPlaneVM Alarms