Disk Space | Blue Matador - Troubleshooting

Your storage disks have limited space, and you’re approaching the maximum allowed on the file system. When investigating, make sure to verify which disk, partition, or file system is the problem; there may even be multiple that have the issue. Start with the biggest problem first.

To check your disk space on Windows, use the Performance Monitor. On Linux, use a combination of du -h / and df -h / .

When checking disk space on Linux, keep in mind that there are two competing metrics. Blue Matador monitors df , which quickly checks the metadata on the filesystem. It frequently reports more data used than du , which checks data used by traversing all the files on the filesystem. A discrepancy appears between the commands when a process is writing to a file that has been deleted. The command df will report that file as “used” space, while the du command will not. In these cases it’s necessary to check for open, deleted files using lsof and potentially restart the process that’s holding open the deleted file handle to clear the space.

For ephemeral servers, disk space should be managed using automated cleanup tools. For databases, you should expect to max out the usage, and either scale the disk size, add more disks, or add more server capacity on a regular basis.

Effects

The moment a file system runs out of disk space, all file operations requiring additional space will be rejected. Until that point, no adverse affects will manifest. Once the disk is 100% full, you’ll begin to notice at least some of the following:

Data loss
Applications crashing
OS restarting
Processes don’t restart
Periodic tasks not firing
Missing log entries

Quick Fix

Remove unused files and folders. Your file system is unique; make sure you know what you’re deleting.

Here’s the prioritized list of our recommended files to delete:

Recycle Bin (Windows only)
Log files older than 24 hours
Large files in user directories

Be careful when deleting files with “log” in the name. It doesn’t necessarily mean it’s a log file. Many databases have replication logs with “log” in the name.

Thorough Fix

First, manage your log files. We recommend sending all your log entries to a centralized log management tool. Find the log management tool that best fits your needs, and use it to reclaim your CPU, disk space, I/O throughput, and log rotation configuration needs.

Second, use a logical volume manager for your storage. On Windows Server 2016, we recommend storage spaces. On Linux, we recommend lvm2. These tools allow you to add and remove disks to increase capacity and fault tolerance without taking any downtime.

Third, if you have long-running Java applications, be weary of the File.deleteOnClose function. Java does not delete those files until the Java process has died.

Lastly, it’s surprisingly common for application developers to leave old files lying around. Find out which application is writing the files in question, and let them know the problem.

Resources

Why DU And DF Display Different Values On Linux And Unix (Linux and Shell Account Blog)
5 Best Places to Free Up Server Disk Space (SherWeb)
Windows Server 2012: Enabling Disk Cleanup Utility (Microsoft TechNet)
Check Disk Space on Linux with the Commands df and du (Lifewire)
Storage Spaces Direct in Windows Server 2016 (Windows IT Pro Center)
An Introduction to LVM Concepts, Terminology, and Operations (DigitalOcean)
Windows Performance Counters Explained (AppAdminTools)