CpuIowait | Blue Matador - Troubleshooting

Docs

I/O wait time is a CPU metric, but it doesn’t indicate CPU problems. It actually measures I/O throughput issues. I/O wait time is a subcategory of CPU idle time — if there is more work to do, the kernel will context switch and the CPU will report user or system time instead of I/O wait time.

For a given CPU, the I/O wait time is the time during which that CPU was idle (i.e. didn’t execute any tasks) and there was at least one outstanding disk I/O operation requested by a task scheduled on that CPU (at the time it generated that I/O request).

Effects

On a healthy system, when I/O wait time increases unexpectedly, it’s an indication that at least one disk is beginning to fail. If not remedied, you can expect:

Decreased disk throughput
Disk failure
Data loss

Quick Fix

If on a replaceable VM or cloud server, terminate and launch a new server.

Thorough Fix

Look at the utilization and throughput of each disk, especially across striped or RAID sets, looking for one disk that is going bad. When the faulty disk is found, replace it.

Resources

The precise meaning of I/O wait time in Linux (Veithen on GitHub)
Understanding Linux CPU stats (Scout App)
Understanding CPU Usage in Linux (OpsDash)
Windows Processor Counters (Microsoft TechNet)
Windows Performance Counters Explained (AppAdminTools)
Troubleshooting High I/O Wait in Linux (Benjamin Cane)

CPU Iowait

Effects

Quick Fix

Thorough Fix

Resources