ServerBeach Downtime Explanation
ServerBeach have issued a report on the downtime that affected all of their customers earlier this week, we have posted it in full in case you have not recieved their e-mail yet.
Background:
Over the past week, we have been conducting our annual maintenance on our power infrastructure in our Virginia data center. This began on Thursday June 22nd and was completed on Sunday, June 25th.What Happened?
Starting at approximately 11:53 am CDT on Sunday June 25th there was an anomaly in the power feeding the network equipment and some customer servers. The power anomaly happened 30 minutes after the ATS (Automatic Transfer Switch) was put into bypass mode to do testing on the system. We are still investigating the true cause, but we strongly suspect a power frequency fluctuation as the catalyst. (The other possibility is a possible lightning strike from severe thunderstorms which were moving through Northern Virginia at that time.) Our UPS is in place, but was not able to smooth out or eliminate the power fluctuation as expected. We are conducting further investigation as to the true cause of the fluctuation and will post additional details as they arise.
What happened next?
The power fluctuation caused both core routers and both distribution routers (which are paired for redundancy) to simultaneously reboot. Both core routers (COR1 and COR2) came back online as expected. However, one of the distribution routers (DIS1) did not come back up after the reboot because of a corrupt flash card. All traffic failed over to the second distribution router (DIS2) and it was handling all of the traffic for all of the VLANs in the data center. In addition, a second failure occured. One of the five GigE cards failed to initialize on DIS2, thus causing a longer network outage for a limited number of VLANs serviced by that GigE card. VLANs being served on that segment were inaccessible until the fault was discovered and the failing card replaced. This process was complete by 3:09 pm CDT. For a basic diagram of this network structure, please refer to the ServerBeach website.Bottom line and Next steps?
We take very seriously our job of being a hosting partner to you and understand that network outages, whether caused by bad weather, power glitches or anything else, can have serious negative impacts to your business. On behalf of all of us here at ServerBeach and Peer1, we sincerely apologize for this disruption of service and promise to make changes to eliminate the possibility of this happening again.With the results of a more detailed investigation, we will determine and communicate to you the corrective actions we are taking. Please stay tuned for more details. In the meantime, if you have any questions or lingering problems with your server(s), please don’t hesitate to call or update this ticket or contact our support team. Our phone number is 1-800-741-9939 option 1.
About this entry
You’re currently reading “ServerBeach Downtime Explanation,” an entry on Business Voyeur
- Published:
- Friday, June 30th, 2006 at 8:05 pm
- Author:
- exospectral
- Category:
- News
Comments are closed
Comments are currently closed on this entry.