and Midwestern parts of the United States along with Ontario, Canada faced a
major power outage on Thursday evening, August 14, 2003, at nearly 4:10 p.m.
Eastern Daylight Time (EDT). It was the second largest blackout in history at
that time affecting nearly 10 million people in Ontario, Canada and
approximately 45 million people in the states of New Jersey, Ohio, Connecticut, Michigan, Massachusetts,
Pennsylvania, Vermont and New York.
The main cause
behind this blackout was a bug in GE Energy’s alarm system at the control room
of an Akron, Ohio based company, FirstEnergy Corporation. Due to this bug, the
operators were not warned by an alarm warning. This resulted in the unawareness
among FirstEnergy system operators that the transmission lines had been
overloaded and race condition had been triggered in the software for energy
management system. As a result, the FirstEnergy system operators failed to take
any action which resulted in a widespread blackout which, otherwise, would have
been easily manageable local blackout. This bug resided so deeply into the code
that it took them weeks to analyze millions of lines of codes and data to
finally rectify it.
important cause of this software failure was that the software did not have any
failure detection module for alarm system. The alarm software had already failed
at 14:14 but the IT staff were completely unaware about this failure until next
40 minutes when the second EMS server stopped functioning. Even after this, the
staff thought that only the server had failed functioning not realizing that
the alarm system had already stopped functioning about 40 minutes ago. This
happened because FirstEnergy Corporation did not have any provision of periodic
diagnostics of the alarm processor which would have rather helped in detecting
the alarm system failure. The computer support staff of FirstEnergy control
center restored the server soon but did not fully test all the functionalities
of the application and, therefore, they were still unaware about alarm system
failure. In addition to this, the FirstEnergy operators even lacked an
effective alternative to easily visualize and analyze the conditions of the
systems. If they had any such functionalities, the operators would have easily
known about alarm system failure and would have allowed them to warn MISO and
neighboring systems about the failure of alarm system which would have put them
on alert and monitor the conditions more closely.
In some parts
power was restored by 11 p.m. but most of the affected areas did not receive
power for 2 to 4 days and in some parts of Ontario, it took about a week to get
the power back. This contributed to an estimate cost of around $5 billion to
$10 billion. Because of this blackout, the gross domestic product of Canada
went down by 0.7% in August as a result of loss of 18.9 million work hours.