Recently I have installed a fresh vSphere 8 cluster for one of my customers. When we enabled DRS, the following problem appeared:

vCLS VMs are deployed, however they are powered off causing Cluster balancing jobs to fail.
“Cluster Service health” showing “Degraded” state due to Unavailability of vCLS and failed balancing jobs.
Re-enabling DRS or Retreat mode does not solve this issue.
Cause and fix
After some research I have found following message in logs:
vCLS-c2cb826a-273a-22dd-93ca-4a86bbcae59f.vmx Power On message: Feature 'cpuid.mwait' was 0, but must be 0x1.
Since vCLS are always deployed with EVC enabled, the issue here is not with vCLS VMS, but with EVC.
To work properly EVC requires certain CPU features to be enabled. One of the features required for EVC is MONITOR/MWAIT, which was disabled on our servers.
To Enable MONITORMWAIT in the UEFI, change the settings from the following path:
Under System Settings->Operating Modes, set the Operating Mode to [Custom Mode], and then set MONITOR/MWAIT to [Enable].

After enabling MONITOR/MWAIT in UEFI, all vCLS VMs has started and related errors has gone.
More details available on Lenovo website: https://datacentersupport.lenovo.com/us/en/products/servers/thinksystem/sr950/7×11/solutions/ht510236-configure-thinksystem-for-enhanced-vmotion-compatibility-evc-lenovo-thinksystem
Leave a Reply