Docs

    Your storage disks have limited space, and you’re approaching the maximum allowed on the file system. When investigating, make sure to verify which disk, partition, or file system is the problem; there may even be multiple that have the issue. Start with the biggest problem first.

    To check your disk space on Windows, use the Performance Monitor. On Linux, use a combination of  du -h /  and  df -h /  .

    When checking disk space on Linux, keep in mind that there are two competing metrics. Blue Matador monitors  df , which quickly checks the metadata on the filesystem. It frequently reports more data used than  du , which checks data used by traversing all the files on the filesystem. A discrepancy appears between the commands when a process is writing to a file that has been deleted. The command  df  will report that file as “used” space, while the  du  command will not. In these cases it’s necessary to check for open, deleted files using  lsof  and potentially restart the process that’s holding open the deleted file handle to clear the space.

    For ephemeral servers, disk space should be managed using automated cleanup tools. For databases, you should expect to max out the usage, and either scale the disk size, add more disks, or add more server capacity on a regular basis.


    Effects


    The moment a file system runs out of disk space, all file operations requiring additional space will be rejected. Until that point, no adverse affects will manifest. Once the disk is 100% full, you’ll begin to notice at least some of the following:

    • Data loss
    • Applications crashing
    • OS restarting
    • Processes don’t restart
    • Periodic tasks not firing
    • Missing log entries

     

    Quick Fix


    Remove unused files and folders. Your file system is unique; make sure you know what you’re deleting.

    Here’s the prioritized list of our recommended files to delete:

    • Recycle Bin (Windows only)
    • Log files older than 24 hours
    • Large files in user directories

    Be careful when deleting files with “log” in the name. It doesn’t necessarily mean it’s a log file. Many databases have replication logs with “log” in the name.

     

    Thorough Fix


    First, manage your log files. We recommend sending all your log entries to a centralized log management tool. Find the log management tool that best fits your needs, and use it to reclaim your CPU, disk space, I/O throughput, and log rotation configuration needs.

    Second, use a logical volume manager for your storage. On Windows Server 2016, we recommend storage spaces. On Linux, we recommend lvm2. These tools allow you to add and remove disks to increase capacity and fault tolerance without taking any downtime.

    Third, if you have long-running Java applications, be weary of the File.deleteOnClose function. Java does not delete those files until the Java process has died.

    Lastly, it’s surprisingly common for application developers to leave old files lying around. Find out which application is writing the files in question, and let them know the problem.

     

    Resources