Docs

    Services and processes can stop for a wide number of reasons. When stopped, the process is unable to perform computations, run background tasks, accept new connections, manage resources, or do anything else. To resume normal operations, the process or service must be started once again.

    To qualify for this recommendation, a process must be long-running and non-interactive. Once the process has been started again (with the same name), the recommendation will be resolved.

     

    Effects


    Reasons for a process stopping include:

    • Manual stopping of the process
    • Running out of memory
    • Unhandled exceptions and errors
    • Unhandled OS signals
    • Buggy code

    Starting a process is not always a safe operation. Before starting the process once again, attempt to verify if the process was intentionally stopped.

    If the process remains stopped, users may encounter:

    • 5xx errors
    • Completely unresponsive services
    • Data loss

     

    Quick Fix


    Ask around if a team member is manually managing this server. Start the process or service as appropriate.

     

    Thorough Fix


    If the service should have been stopped, better team communication is in order. Use a Slack channel, an email group, or an announcement to indicate expected outages.

    If the service should not have been stopped, then comb log files, monitoring tools, and audit files for a root cause, focusing on stack traces and errors.

     

    Resources