VMware’s VSAN, as a solution, has many virtues. However, with the wide range of HCL-compliant hardware and software configurations there are bound to be some problems. One of these is described in VMware KB 2135494 - which is specific to running VSAN on Dell servers with the PERC H730 storage controller. It primarily describes the recommended I/O controller driver and firmware combination to avoid drive failure states.
Towards the end of the KB, there is also a recommendation to adjust certain I/O timeout and retry behaviours.
I don’t enjoy going through each host manually updating advanced configuration parameters, so I quickly created two PowerCLI one-liners below to apply these settings across specified cluster.
Get-Cluster -name "Your Cluster Name" | Get-VMhost | Get-AdvancedSetting -name LSOM.diskIoTimeout | Set-AdvancedSetting -Value 110000 -Confirm:$false
Get-Cluster -name "Your Cluster Name" | Get-VMhost | Get-AdvancedSetting -name LSOM.diskIoRetryFactor | Set-AdvancedSetting -Value 1 -Confirm:$false