VMware vCenter 5.0 U1a has been released. This patch has a permanent fix for the Storage vMotion/dvSwitch bug which caused HA to fail in a few cases after a hardware failure. I've tested the Update in my Homelab. The fix works as expected and enables the vCenter to move the dvSwitch port information to the appropriate datastore during Storage vMotion. But there is a little limitation: The patch prevents virtual machines from getting affected, but it does not fix currently affected virtual machines. This means that you still have to fix affected virtual machines after installation.
- Backup vCenter Server Database
- Install vCenter Server 5.0.0 U1a (Build 757163)
- Fix affected virtual machines using one of the Scripts below
VMware also released a patch for their ESXi Hypervisor, which made 768111 the current version.
Today I tried to explain why this Storage vMotion / dvSwitch / HA problem actually existis, how the virtual machines getting affected and what I can do to mitigate the problem. This issue seems to be hard to explain so i started to search the internet for pictures. I could find many explanations, but no one draws a picture of it. I've done it:
- We have 2 ESXi hosts, both connected to a shared storage containing 2 LUNs. The first ESXi runs two virtual machines (VM1 & VM2), the second runs only one (VM3). All three virtual machines are connected to a Distributed Virtual Switch. There is an additional folder on each LUN containing the dvSwitch Port information (.dvsData). This information is required for ESXi servers to know where the ports belong to, without asking the vCenter.
- Using storage vMotion VM2 is migrated to LUN 2. This could be done either manually or triggered by Storage DRS. Actually, this is where the bug happens. All files like .vmx or .vmdk are moved to LUN 2 and whyever the dvSwitch port information remains on LUN 1. The bug has happend, but nothing noticeable at this point. VM2 stays up and running without any network issues.
- The first ESXi dies. This is where HA shoud get active and initiates a restart on another host.
- HA initiates VM1 to be started on the second ESXi. Everything fine.
- HA initiates VM2 to be started on the second ESXi. During this process, HA tries to access the port information inside the .dvsData directory. HA fails as it can't find the port information within LUN2 .dvsData directory.
Operation failed, diagnostics report: Failed to open file /vmfs/volumes/UUID/.dvsData/ID/Port Status (bad0003)= Not found
Issue explaind by Duncan Epping @ Yellow-Bricks
Script to identify and fix affected VMs by Alan Renouf
Placing the datastore clusters inside a folder in some cases is not an option, so i decided to write a PowerCLI script which creates the permisson after vCenter service restart. As you might know, all permissons set at datastore cluster level are gone after vCenter restarts. This workaround referred to VMware KB: 2008326.
First you have to find affected permissons. This applies to permissons which are set directly to datastore clusters. A datastore cluster is referred as "StoragePod", so this is the keyword:
Read more »
After migrating datastores to datastore clusters and adding permissons at datastore cluster level I run into incomprehensible issues where users suddenly failed to create VMs. Users getting an error messege while selecting the Cluster:
You do not have the privilege 'Datastore > Allocate space' on the datastore connected to the selected Cluster
I checked the vCenter permissons and noticed that the datastore permission is missing. I remembered that there was this bug that causes all vCenter permissons to disappear after renaming Windows users or groups, so i just reassigned the permisson. Shortly later the problem recurred, so I searched the VMware KB and noticed that this is a known issue: KB: 2008326
VMwares resolution is to place the datastore cluster inside a folder, set the permissons to that folder and propagate them. Unfortunately you can't simply move the cluster:
Move entities - The specified folder does not support this operation.
The solution is to create a new storage cluster, recreate all settings and move the VMFS-Datastores into the new cluster. This task is doable withou any interruption to the running virtual machines.