Docs

    Given a min, max, and desired number of instances, an auto-scaling group will automatically launch and terminate cloud servers based on rules around time, load, or queued work. When the number of launched servers is less than the number you want or need, the auto-scaling group is in a “sick” state.


    Effects


    Here is a list of common causes, in decreasing order of likelihood:

    • Bad configuration. If newly launched servers don’t initiate properly, they’ll either shut down themselves, lock up, or immediately fail health checks.
    • Disabled auto-scaling actionsM. If a server is terminated, and auto-scaling actions are disabled, the auto-scaling group will refuse to replace that server.
    • Failed application health checks. If a new release, or a change of configuration causes the servers to fail health checks, you’ll be in a constant state of trying to launch more servers.
    • Cloud error during relaunch. If your cloud provider is unable to launch additional servers due to their own errors or server shortage, auto-scaling actions will have no impact.
    • Grace period is too short. If your servers take more time to come into rotation than the grace period allows, they will be shutdown. The auto-scaling group will never be able to recover.
    • Spot instance pricing too low. If your bid is too low, the auto-scaling group will not launch more servers.

     

    Quick Fix


    Disable all auto-scaling actions so the problem doesn’t get worse. Manually launch more instances and associate them with the scale group.

    If launching new servers is rejected by your cloud provider, re-purpose existing, low-load servers.

    Watch your cloud status page for any reported errors. Enable all auto-scaling actions when the problem has subsided.

     

    Thorough Fix


    Test your auto-scaling configuration and machine images to make sure instances come into rotation appropriately. Automate your cloud configuration changes to avoid errors in the future.

    Adjust your auto-scaling cooldown to match how long new servers take to pass health checks.

    If applicable, rethink your spot instance pricing strategy. Many organizations bid far more than the on-demand price to ensure their spot instances are never terminated abruptly.


    Resources