Ntirety - Newark, DE - Network Event – Incident details

All systems operational

Newark, DE - Network Event

Resolved
Major outage
Started about 1 month agoLasted about 16 hours

Affected

Ntirety Customer Support

Operational from 8:01 AM to 1:33 PM, Major outage from 1:33 PM to 2:04 PM, Partial outage from 1:33 PM to 2:04 PM, Major outage from 2:04 PM to 2:30 PM, Partial outage from 2:04 PM to 2:30 PM, Major outage from 2:30 PM to 3:03 PM, Partial outage from 2:30 PM to 3:03 PM, Major outage from 3:03 PM to 4:02 PM, Partial outage from 3:03 PM to 4:02 PM, Major outage from 4:02 PM to 4:22 PM, Partial outage from 4:02 PM to 4:22 PM, Major outage from 4:22 PM to 5:03 PM, Partial outage from 4:22 PM to 5:03 PM, Major outage from 5:03 PM to 5:40 PM, Partial outage from 5:03 PM to 5:40 PM, Major outage from 5:40 PM to 6:05 PM, Partial outage from 5:40 PM to 6:05 PM, Major outage from 6:05 PM to 7:14 PM, Operational from 6:05 PM to 12:21 AM

Phone Systems

Operational from 8:01 AM to 1:33 PM, Partial outage from 1:33 PM to 6:05 PM, Operational from 6:05 PM to 12:21 AM

Monitoring Services

Operational from 8:01 AM to 1:33 PM, Major outage from 1:33 PM to 7:14 PM, Operational from 7:14 PM to 12:21 AM

Desktop-as-a-Service

Operational from 8:01 AM to 1:33 PM, Partial outage from 1:33 PM to 7:14 PM, Operational from 7:14 PM to 12:21 AM

Newark, DE

Operational from 8:01 AM to 1:33 PM, Partial outage from 1:33 PM to 7:14 PM, Operational from 7:14 PM to 12:21 AM

Shared Storage

Operational from 8:01 AM to 1:33 PM, Partial outage from 1:33 PM to 7:14 PM, Operational from 7:14 PM to 12:21 AM

Updates
  • Resolved
    Resolved

    We have finalized the maintenance to allow our operations teams management access and our provisioning engines to function within the portal once again. During the maintenance we noted zero network anomalies, the teams will continue to review the incident and provide a root cause analysis of the event, to be shared to all customers via support tickets once available.

  • Update
    Update

    As part of our recovery from the Newark (Nwk01) networking event. We are continuing our maintenance to restore management access to all switches. This effort will begin at 6pm Eastern Time. We will be performing this in waves, and monitoring the network before continuing. Once completed an update to this status will occur.

  • Update
    Update

    We are currently pausing any additional changes to our switches to restore management access as we have ability to review and monitor for other causes, during this pause there may be a an impact to our support teams ability to make firewall changes until we submit a change request which will occur after 6pm Eastern Time today (March 6th, 2025).

  • Monitoring
    Monitoring

    While we have implemented a fix to the downstream devices causing a broadcast storm, during initial troubleshooting we removed our management network from the NWK01 zone. To restore full management capabilities, we are adding the management network back to the switches and monitoring stability of traffic.

  • Update
    Update

    After removing the uplinks associated downstream devices, and removing a problematic virtual chassis we have seen significant improvement to the network stability. We continue to work with customers on individual issues from the network event.

  • Update
    Update

    We have identified a broadcast storm to a single virtual chassis, we are currently removing uplinks from the downstream devices we have identified and will monitor for stability.

  • Update
    Update

    We resolved the issue seen on the distribution router, unfortunately we have not seen stability return to the rest of the customer environment. We are continuing our investigation into the network loss/stability.

  • Update
    Update

    We have identified additional errors within the distribution router, which we are testing a resolution to currently. An update will be provided shortly with those findings.

  • Update
    Update

    We see stability for a subset of customers currently, however we still see saturation within the network and we are working to identify the cause.

  • Update
    Update

    We have removed problematic VLANs from routing, which has restored customer connectivity/stability.

  • Update
    Update

    We are performing an intrusive test against the aggregate switches associated to NWK01. This test may result in further instability to NWK01 before traffic is restored.

  • Update
    Update

    We have further isolated the issue to our aggregate switches, we are in contact with our hardware support vendors to continue to diagnose and resolve the errors. We will have another update in 20 minutes. The changes we make on the current switches, does not impact the secondary environment in Newark which remains stable.

  • Identified
    Identified

    We are continuing investigating a network looping event which is leading to sporadic network connectivity in a subsection of our Newark environment. We will have a in-depth update in 30 minutes.

  • Update
    Update

    The scope of the issue has been narrowed down to a single switching domain in the Newark DC.

  • Investigating
    Investigating

    We are currently investigating a problem with network connectivity.