Thursday, February 7, 2008

Power Outage Debrief

The DSO experienced a power outage from 10:59 AM - 11:17 AM today, Wednesday, February 7. Because our IT infrastructure is centralized in the DSO, this could have been quite a problem. Fortunately, we designed our new datacenter with some elements to keep us up and running during such an outage. 

Here's a debriefing of the incident. 

At 10:59 AM, power was lost at the DSO, NKE, and PVE (and possibly others as well). OMS reported a brief flicker.

When the power went out, the UPS units kept all servers, core switches, routers, transceivers, and PBX equipment running until the emergency generator turned on and supplied power. The generator was supplying power within one minute. The datacenter load transferred to the generator power source seamlessly. All VoIP phones in the DSO stayed operational because they use power over ethernet (POE) from the 3750 switch. 

Jon opened the datacenter door with a key because the strike plates did not open when card was swiped. This was fixed a few minutes after the generator power came online. Control Solutions will add UPS backups to their systems soon to insure that card access keeps working in an outage.
Desktop computers in the DSO all turned off (except Zach's because he put his on a UPS). Those in the tech area turned back on when the generator power kicked back in. 

The WAN connection to OMS stayed up (Kiwi Syslog provides notification of dropped links), which is better than the previous two power "blips" that have caused the switches at OMS to reboot, thus interrupting the network connections for the entire school.

The only WAN connections that went down unexpectedly were the switches at PVE. They should have lasted longer. This was not a crucial issue, since they do not use VoIP phones, and all the computers were down anyway. Upon investigation, the UPS powering those switches was too small and old. Zach replaced it with the one from his desk.

When designing the tech area, we decided to include all the electrical outlets on the generator-supplied circuits so that the tech offices can be used as an emergency headquarters. This was the first extended blackout since then, so I informed other DSO staff that if they had urgent work to do, they could use a desktop computer in our area. 

The idea is that district administrators could bring their laptop computers and phones to the tech area and set up a temporary command center. Our wireless access points use POE to remain active during a blackout, and laptops run on battery power for a while, so they can stay running and connected for quite a while. 

Power came back online at 11:17. Generator turned off at 11:27. Desktop computers in the tech area stayed up during the cutover back to regular power.

In the old scenario without an emergency generator, all district phones would have gone dead around 11:05. All server connections would have been lost around 11:10. 

I'm very happy with how everything worked out, especially considering how bad this would have been without the emergency generator.

No comments: