Disk latency is the time that it takes to complete a single I/O operation on a block device.

You can see the disk latency in Windows Server Performance Monitor and by using  iostat -xd 1  on Linux systems. Stats are displayed and monitored per disk.

In general, the disk latency will vary by type of disk. A magnetic drive will be far slower than an SSD. No absolute number can be given as example of good or bad disk latency — it depends on your hardware and throughput. Magnetic drives will have greater access times (and therefore greater spikes in disk latency) than solid-state drives.

Because of file system, application, and OS caches, temporary increases in disk latency can often be handled without incident. Large statistical deviations in latency, even temporary ones, are the first sign of disk failure, whether it be network attached or a local block device. Sustained increases will have more lasting impact on the applications running on the system.


EFFECTS


Sustained increases and deviant spikes in disk latency would be accompanied by:

  • Slow database response times
  • Disk-heavy applications like Hadoop
  • Slowness across all endpoints in your application

Replacing faulty or degraded hardware will keep your infrastructure running smoothly.

 

QUICK FIX


Replace the hardware, whether that be a disk or a virtual machine.

 

THOROUGH FIX


If a straight replacement of the hardware doesn’t fix the problem, there are a couple things you can try:

  • Reduce access time. If your disk is magnetic, most of the latency can be attributed to moving the drive head. Upgrading from magnetic drives to solid state ones nearly eliminates the random access time.
  • Add more IOPS. Increasing magnetic drive speed, adding more stripes to your array, or upgrading the number of IOPS from your cloud provider will reduce the queue size and help alleviate congestion latency.
  • Alter the RAID settings. Depending on which RAID you’re using, you could be issuing a lot of disk operations. For example, RAID 6 requires 6 disk writes for every I/O operation from the OS. If your business case allows for it, consider changing your RAID settings to perform fewer operations.


RESOURCES