Docs

    Rates of bytes and packets on the network are an excellent measure of application availability and health. In the age of connectivity and microservices, network throughput is one of the most reliable statistics. While small spikes are expected, large spikes and sustained changes are indicative of other issues on the server. This recommendation is one that takes into account past trends, growth, and decay.

    Because of the general nature of this recommendation, the best thing you can do is check vitals and other metrics for possible problems. Use your other monitoring tools to find issues with code, load, users, scaling, or other infrastructure.

    This recommendation is almost never indicative of an error in hardware; it’s almost always software. Similarly, it’s rarely a false positive despite the lack of specificity of the root cause.

     

     

    Effects


    While searching for problems, make sure you look at some of the most common causes:

    • The server dropped out of a load balancer
    • A new code release fundamentally changed the traffic patterns
    • An administrator changed a firewall or network configuration
    • Critical scaling trigger hit so more servers are needed

     

    Quick Fix


    Verify the critical applications on this server are still responsive.

     

    Thorough Fix


    Compare server vitals against deployments, configuration changes, infrastructure changes, user logins, and automated remediation. For every inflection point, identify the root cause. If one seems out of place, investigate and remedy as appropriate.

     

    Resources