I/O wait time is a CPU metric, but it doesn’t indicate CPU problems. It actually measures I/O throughput issues. I/O wait time is a subcategory of CPU idle time — if there is more work to do, the kernel will context switch and the CPU will report user or system time instead of I/O wait time.

For a given CPU, the I/O wait time is the time during which that CPU was idle (i.e. didn’t execute any tasks) and there was at least one outstanding disk I/O operation requested by a task scheduled on that CPU (at the time it generated that I/O request).


EFFECTS


On a healthy system, when I/O wait time increases unexpectedly, it’s an indication that at least one disk is beginning to fail. If not remedied, you can expect:

  • Decreased disk throughput
  • Disk failure
  • Data loss

 

QUICK FIX


If on a replaceable VM or cloud server, terminate and launch a new server.

 

THOROUGH FIX


Look at the utilization and throughput of each disk, especially across striped or RAID sets, looking for one disk that is going bad. When the faulty disk is found, replace it.


RESOURCES