Best practices imply a platonic or idealised form, the pinnacle, if you will, of configuration where cost is placed after resilience or availability in the order of priorities. Whereas good practices have a more practicable approach and balance capabilities and cost (also known as trade-offs!).
And so on that note, this blog topic is on the good/best practice of implementing a management cluster as part of your VMware virtual infrastructure platform. A management cluster is a dedicated set of hosts (per site) configured as a VMware vSphere HA/DRS cluster. The management cluster is used to provide resources for virtual machines that provide management functions for the virtual infrastructure. This includes (not all may be applicable to every site); active directory (or similar), DNS/DHCP, general management server (for Network and SAN tools etc.), vCenter server, database server for management databases, web client server, single sign-on server, Site Recovery Manager; vSphere replication server, vCenter operations manager (and other components of the Suite), vCloud Director cell (and related components), vCloud Networking and Security Manager, data protection servers (e.g. CommVault VSA).
The primary purpose of the management cluster is to separate the resources used to manage virtualisation workloads (i.e. any non-management workload, so this includes dev etc.) from those virtualisation workloads themselves. In the event of a site failure and subsequent recovery, the management cluster is the first thing you bring up. This means you can get the management functions back first and faster.
So not only should the compute resources be separated (i.e. a separate cluster) but consideration should also be given to separating the storage provisioned to the management cluster (from the production storage) – in a similar manner to how you would not retain backup copies on the same storage systems as production data. If separate storage is provided then this may be a dedicated array, and consideration can also be given to the vSphere storage appliance which shares local storage across a cluster of hosts (this would need further discussion though as there are some limitations with the VSA).
A further validation for establishing a management cluster is that amount of resources required to service the management components can easily equate to the resources available from one or more hosts. If these resources are provisioned from production clusters then this a) leaves less resource available for production workloads and b) makes it more difficult to do capacity planning for production workloads (i.e. as more management capability is added – think of adding metering and billing at a later date – then less resources are available).
The management cluster should ideally be a three nodes cluster per site (best practice), although it is common to relax the three node cluster best practice and have a two node N+1 cluster where N is 1 (good practice since we still have a Management Cluster). Ideally the same virtual networking should be used across management clusters and production clusters (distributed virtual switch yes, but Nexus 1000V possibly not unless you are implementing the physical Nexus 1100 appliance).
In summary, as the reliance on the virtual infrastructure continues to rise (think how business critical applications are being targeted more frequently now), then increased resilience of the management components becomes critical.
Why not read more from Michael Webster at VMware - with his reply to our blog on management clusters.