[RFO] NODE01.MCO down All the latest from NodeSpace


[RFO] NODE01.MCO down (Resolved)
  • Priority - Critical
  • Affecting Server - NODE01.MCO.NODESPACE.US
  • Reason for Outage (RFO)

    Location:
    NODE01.MCO.DC2

    Date:
    August 26, 2017

    Events:
    11:24 AM (GMT-4) During routine server maintenance, the NODE01.MCO server encountered an issue which caused Apache (HTTP) to stop responding to web requests. Our Linux Admins' attempts to bring Apache back online were unsuccessful so they issued a reboot of the server under emergency maintenance procedures.
    11:34 AM (GMT-4) The server still had not come back up from reboot. Our Linux Administration Team alerted our NOC team to check the physical server out.
    12:34 PM (GMT-4) Our NOC team reported that the server was fine hardware side and provided our Linux Team KVM access.
    1:07 PM (GMT-4) Our Linux Admin Team was able to restore services.

    Root cause of the issue:
    Our Linux Administration team noticed that during server boot up, several of the critical services were not starting. Upon investigation, the root cause was SELinux becoming enabled in the "permissive" state instead of "disabled.

    Resolution:
    Our Linux Administration team booted the affected server into single user mode and disabled SELinux and rebooted the server normally.

    Action plan:
    Our Linux Administration team will be closely monitoring the server and reviewing any updates that were automatically installed and working with the upstream software vendors to ensure that this does not happen again.

    Total downtime:
    Customers on NODE01.MCO experienced 1 hour, 43 minutes of downtime. This downtime is able to be claimed for an SLA credit.

    We sincerely apologize about the inconvenience and will do our best to reduce/eliminate future occurrences of the same problems.

  • Date - 2017-08-26 11:24 - 2017-08-26 13:07
  • Last Updated - 2017-08-26 13:51

  Print


Comments


  Add Comment

Confirm Submission

Please enter the text from the image in the box provided, this helps us to prevent spam.



Reply
Joe   08/26/2017 11:45 PM
Thank you for the transparency. I'm happy that you guys were able to resolve the issue quickly and kept customers in the loop.

Powered by WHMCompleteSolution