After upgrading my lab to vSAN 6.6, I noticed that the newly introduced vMotion health check (vSAN Cluster > Monitor > vSAN > Health > Network) showed up as failed for the following checks:
- Failed - vMotion: Basic (unicast) connectivity check
- Failed - vMotion: MTU check (ping with large packet size)
I verified that vMotion works as expected and ESXi hosts can successfully ping each other by using the vMotion network. To identify the Issue I checked the vsanmgmt.log log file which revealed the issue:
VSANMGMTSVC: ERROR vsanperfsvc[x] [VsanHealthUtil::wrapper] Error to run _getIPRouteListFromEsxCLI VSANMGMTSVC: WARNING vsanperfsvc[x] [VsanHealthPing::CreateSocket] Create ping socket Exception: [Errno99] Cannot assign requested address VSANMGMTSVC: INFO vsanperfsvc[x] [VsanHealthPing::Ping] Pinger: ping target number: 2 VSANMGMTSVC: ERROR vsanperfsvc[x] [VsanHealthPing::Ping] Cannot find ping socket for local ip 192.168.225.121
Obviously, the health check is unable to bind to the vMotion VMkernel adapter which seems to be a similar error as when you are trying to use the ping command for a non-standard network stack instance, without using the -S parameter to set the corresponding stack.
To verify the finding, I moved the VMkernel adapter to the default TCP/IP stack (which means deleting and recreating the adapter):
After changing all VMkernel adapters, I reinitiated the health check and all tests turned green:
Also, the vsanmgmt.log log file shows successful ping tests:
VSANMGMTSVC: INFO vsanperfsvc[x] [VsanHealthSystemImpl::_QueryVerifyNetworkSettings] Query network settings: (str) ['10.100.0.122','10.100.0.123' ] VSANMGMTSVC: INFO vsanperfsvc[x] [VsanHealthPing::Ping] Run ping test for the hosts ['192.168.225.122', '192.168.225.123'] from local 192.168.225.121
The problem is a documented known issue in the vSAN 6.6 Release Notes.
vMotion network connectivity test incorrectly reports ping failures
The vMotion network connectivity test (Cluster > Monitor > vSAN > Health > Network) reports ping failures if the vMotion stack is used for vMotion. The vMotion network connectivity (ping) check only supports vmknics that use the default network stack. The check fails for vmknics using the vMotion network stack. These reports do not indicate a connectivity problem.
If you do not want to reconfigure all vMotion VMkernel adapters, you can set the health check to silent by using RVC. If you are unfamiliar with RVC, see this article. Use the following commands to set both checks to silent:
- vsan.health.silent_health_check_configure -a vmotionpingsmall
- vsan.health.silent_health_check_configure -a vmotionpinglarge
> vsan.health.silent_health_check_configure -a vmotionpingsmall vc.virten.lab/Datacenter/computers/vSAN65/ Successfully add check "vMotion: Basic (unicast) connectivity check" to silent health check list for vSAN65 > vsan.health.silent_health_check_configure -a vmotionpinglarge vc.virten.lab/Datacenter/computers/vSAN65/ Successfully add check "vMotion: MTU check (ping with large packet size)" to silent health check list for vSAN65
Health checks should now appear as skipped:
To remove checks from the silent list, use the -r option:
- vsan.health.silent_health_check_configure -r vmotionpingsmall
- vsan.health.silent_health_check_configure -r vmotionpinglarge