Ntirety - Newark, DE - Network Event – Incident details

All systems operational

Newark, DE - Network Event

Resolved
Operational
Started 6 days agoLasted 5 days

Affected

Ntirety Customer Support

Degraded performance from 8:06 PM to 10:03 PM, Partial outage from 10:03 PM to 11:58 PM, Operational from 11:58 PM to 11:05 PM

Phone Systems

Degraded performance from 8:06 PM to 10:03 PM, Partial outage from 10:03 PM to 11:58 PM, Operational from 11:58 PM to 11:05 PM

Ntirety Cloud Services

Operational from 8:06 PM to 8:36 PM, Partial outage from 8:36 PM to 11:58 PM, Degraded performance from 11:58 PM to 9:05 PM, Operational from 9:05 PM to 11:05 PM

Newark, DE

Operational from 8:06 PM to 8:36 PM, Partial outage from 8:36 PM to 11:58 PM, Degraded performance from 11:58 PM to 9:05 PM, Operational from 9:05 PM to 11:05 PM

Networking

Partial outage from 8:06 PM to 11:58 PM, Degraded performance from 11:58 PM to 9:05 PM, Operational from 9:05 PM to 11:05 PM

Newark, DE

Partial outage from 8:06 PM to 11:58 PM, Degraded performance from 11:58 PM to 9:05 PM, Operational from 9:05 PM to 11:05 PM

Updates
  • Resolved
    Resolved

    Ntirety has finalized our root cause analysis and will be sending to all customers via ticket and email within the hour. If you do not receive the document please feel free to contact support via phone (866.918.4678), chat or self-service ticket and a technician will provide to you the RCA.

  • Update
    Update

    We determined that a Layer 2 broadcast storm within the storage area network caused the service degradation event starting yesterday at approximately 3:00 pm ET. Network switches experienced repeated failures during the incident due to excessive packets per second.  Once the affected network segment was isolated, we restored stability to the customer environments. We are conducting a root cause analysis and will provide our formalized review by the end of the business day on February 3rd.

  • Monitoring
    Monitoring

    The platform has shown stability as of 18:30 ET, we continue to monitor the platform for any signs of instability and are working with customers individually for any known issues with their virtual machines. We are still under a root cause review and will share the RCA with all customers when available. Once the root cause is identified we will move forward with a maintenance to bring back redundancy.

  • Identified
    Identified

    All operational teams continue to restore virtual machines that lost connectivity to their guest OS. In parallel, we continue to investigate the main root cause of the switch issue identified, we will not move forward with another emergency change until this is identified connectivity will be single homed.

  • Monitoring
    Monitoring

    We have initiated a swap on the identified switch member, and currently monitoring the environment for issues and any further stability concerns. There is a known subset of virtual machines which need manual intervention to regain access to their OS Disk.

  • Identified
    Identified

    After moving forward with a reboot of a host and in addition a member switch, we correlated to the incident. The reboot allowed us to correlate the issue to a member switch, we are currently replacing the switch and will provide an update when replaced and failed over.

  • Update
    Update

    We have isolated an issue to a specific host in a unresponsive state. We are initiating an emergency change to address the host and will update the status

  • Update
    Update

    We continue to investigate the issue and isolating potential causes, the next update will occur at 15:40 ET.

  • Investigating
    Investigating

    We are currently investigating a problem with network connectivity.