VMware EVC design recommendation

I personally recommend to enable Enhanced vMotion Compatibility (EVC) as a best practice, however it is also necessary to know the exception to this rule and when to not use it.
This post describes the arguments behind this recommendation originating from some real challenges in production.

EVC overview

This topic is already well documented.
So if you are not already familiar with it, please read the official documentation and follow these links:
EVC and CPU Compatibility FAQ
Enhanced vMotion Compatibility (EVC) processor support
VMware EVC Mode Explained

Real challenges in production

Add a server of newer generation in a cluster with older server.

Scenario:
Cluster based on servers with CPU of V1 generation (Intel Sandy-Bridge)
The hardware vendor has stopped to sell this model, so it was only possible to get new server with V2 CPU (Intel Ivy-Bridge)
There was no argument to justify the cost of replacing all V1 servers by V2 servers.
Therefore it was necessary to have, for a transition period, a cluster with mixed generation CPU.
At the beginning only one new server was added to the cluster.

Potential issue without EVC:
All VMs started on the V2 server will automatically use all features available in the processors (Ivy-Bridge)
However it will not be possible to migrate them to others hosts in the cluster. (Sandy-Bridge is missing some features)
If it is necessary to do maintenance on the host, it will be necessary to shutdown all VMs using the V2 features.
So it is a direct impact on AVAILABILITY. Not surpinsigly, customers do not like planned downtime…

Prevent the issue with EVC:
Enable EVC “Intel Sandy-Bridge” in the cluster before adding the new host.
All VMs started on the V2 host will only see the “Intel Sandy-Bridge” features.
It will then be possible to migrate them accross the host without any issues.

Migrate vCenter to a host with lower EVC

This is the challenge at the origin of this post.
The goal is to prevent others VMware users to face similar issues.

Scenario:
vCenter 6.0
The vCenter VM runs in a cluster without EVC.
The destination cluster is configured with the highest EVC level available (Intel Haswell).
The servers in the two clusters are from different vendors but all with E5-2690 v4 CPUs.

Issue:
Attempting a vMotion from the source to the destination cluster doesn’t work.
The target host does not support the virtual machine’s current hardware requirements.
To resolve CPU incompatibilities, use a cluster with Enhanced vMotion Compatibility (EVC) enabled. See KB article 1003212.
3DNow! PREFETCH and PREFETCHW are unsupported.
com.vmware.vim.vmfeature.cpuid.smap
com.vmware.vim.vmfeature.cpuid.adx
com.vmware.vim.vmfeature.cpuid.rdseed
com.vmware.vim.vmfeature.cpuid.rtm
com.vmware.vim.vmfeature.cpuid.hle

It doesn’t seem logical it is the highest EVC mode available, and the processors are identical.


Root cause:
E5-2690 v4 is from “Broadwell” generation as mentioned in the HCL.
The vCenter VM is now using the feature of “Intel Broadwell” so it is not possible to migrate to EVC “Intel Haswell”
The maximum EVC level with vCenter 6.0 is “Intel Haswell”.
(The 1003212 KB was actually not correct, and was listing EVC “Intel Broadwell” compatible with vCenter 6.0. It has been fixed very fast after my feedback)
However for vCenter 6.5 the maximum is “Intel Broadwell”.
It explains why picking the highest EVC level available was still not enough for vMotion.
The processors E5-2690 v4 should be considered as “future generation” for vCenter 6.0

How this could have been prevented with EVC:
If the cluster was configured from the beginning with EVC “intel-haswell” this problem will not be present.

Options available to fix the issue:
1 – Do not enable EVC on the new cluster
I do not like this option, it guarantees future problem.
2 – Migrate vCenter to the new cluster with a downtime.
I am planning to follow the KB “How to enable EVC in vCenter Server” but with a twist that will make the migration easier.
It will be covered in a future post.
3 – Upgrade to vCenter 6.5
If the vCenter is upgraded to 6.5 then it is possible to use EVC “Intel Broadwell”
It will actually help to migrate others VMs in the same situation.
However, it is not possible to perform this kind of operation without a good justification and planning.

Availability and Manageability VS Performance

The two previous scenario demonstrates that enabling EVC has a real positive effect for availability and manageability.

However, on the negative side, EVC “may” impact performance.
Does Enhanced vMotion Compatibility (EVC) Affect Performance?
The concept in this post are still valid today, even if there are now higher EVC modes.
The key point is “Corner case”. Some applications may benefit from latest CPU features but it is more the exception than the rule.
When using EVC, use the highest level possible based on requirements and constraints, to reduce the performance impact of the features “hidden”.
If you have for example only one host of old generation in the cluster, it may be good to consider removing it to increase the EVC level for all others hosts.

Conclusion

I personally recommend as a best practice to enable EVC for all hosts connected to vCenter.
If one host is not part of a cluster, create a cluster of one host just to get the possibility to enable EVC for this host.

For each cluster aim for the highest EVC based on constraint and future changes.
It means that the “technical highest possible” is not necessary the best for the design.

At the end if the performance are “good enough” and the availability and manageability are good then the design is doing the job…and you will make friend with the people who will operate the infrastructure later 😉

However there are some scenarios where it will be necessary to consider disabling it.
If the design requirements privilege performance above all, then it will be necessary to carefully review the applications and ideally do performance testing with and without EVC.
If it has been identified application that will benefit from highest CPU features not available with highest EVC mode.
If the performance are not “good enough”. On top of a performance troubleshooting consider increasing or disabling EVC in the testing.

Finally it is easier to move from EVC to higher EVC or no EVC than in the other way.
So it is better to start with EVC, and if needed later update the EVC level or remove EVC.
However keep in mind that a power cycle will be needed for a VM to access latest CPU features.

Bonus, design consideration:
On top of all standard considerations to select a vCenter version consider the EVC mode available.
The “EVC Broadwell” give an advantage for vCenter 6.5 against vCenter 6.0.

One thought on “VMware EVC design recommendation

  1. Pingback: Migrate vCenter Server VM to EVC cluster via ephemeral distributed port group - The Crazy Consultant

Leave a Reply

Your email address will not be published. Required fields are marked *