Let's say you have reports of poor performance affecting multiple virtual machines. You take a look at the virtual machines in question, but everything seems fine. No significant CPU usage, no swapping nor ballooning and almost no disk activity.
![]() |
ESX host with occasional IO latency spikes |
Does the disk latency graph look like the first graph having occasional spikes way over 30ms?
If so, check other ESX hosts. Are there similar spikes in common with the first ESX host?
In this particular case multiple hosts are effected, so we are almost certainly experiencing a saturation in the storage environment.
![]() |
Another ESX host probably causing the issue |
Now back on topic, how do we solve the issue? The graph from the second ESX host is typical for batch jobs like backup or database imports/exports. Those jobs tend to run as fast as possible so using a faster disk array will only shorten the duration of side effects but will not solve it, as long as the disk array is the slowest part in the equation. So, in this case the issue is not having two ESX hosts with high latency IOs, but having one virtual machine on one host starving all other in machines in the cluster. This issue is known as the noisy neighbor problem.
A manual approach would be to isolate those bursty workloads on separate disks and in a separate datastore. Doing this easily becomes time consuming and cumbersome, especially in large environments. If you have vSphere 4.1 or higher and have, or can afford Enterprise Plus Licenses there is an easier solution: Storage IO Control or SIOC for short. It will distribute the available IO capacity fairly among all virtual machines as soon as the latency on a datastore passes a configured threshold and therefore preventing a noisy neighbor from severely affecting other virtual machines running from the same datastore.
hth someone,
/jr
No comments:
Post a Comment