Difference between revisions of "OSB:20130708-01"

From Digibase Knowledge Base
Jump to: navigation, search
(Undo revision 382 by Kradorex Xeron (talk))
 
Line 2: Line 2:
 
=''OPERATIONAL STATUS BULLETIN: {{PAGENAME}}''=
 
=''OPERATIONAL STATUS BULLETIN: {{PAGENAME}}''=
  
'''Issued:''' [[User:Kradorex Xeron|Kradorex Xeron]] ([[User talk:Kradorex Xeron|talk]]) 03:14, 28 September 2013 (EDT)
+
'''Issued:''' [[User:Kradorex Xeron|Kradorex Xeron]] ([[User talk:Kradorex Xeron|talk]]) 23:44, 8 July 2013 (EDT)
  
'''In Regards To:''' Core Router Failure
+
'''In Regards To:''' Extra-facility Power Outage
  
 
'''Facility:''' Unicomplex One (Hamilton, ON, Canada)
 
'''Facility:''' Unicomplex One (Hamilton, ON, Canada)
  
'''Affected:''' *.digibase.ca (all systems, all services), especially cplexus.unimatrix01.digibase.ca
+
'''Affected:''' *.digibase.ca (all systems, all services)
  
'''Ticket #:''' CT-0000074
+
'''Ticket #:''' ''No internal ticket was issued''
  
 
'''Expected Duration:''' ''Unknown''  
 
'''Expected Duration:''' ''Unknown''  
  
'''Status: ''' Event started at 22:40 on 25 September 2013, ended at 07:30 on 25 September 2013, failovers were completed in between. Outage of equipment was estimated 8.5 hours. Service outage was estimated 1.5 hours.
+
'''Status: ''' Event started at 19:13, ended at 21:17. Outage was estimated 2-3 hours.
  
 
==Situation Description==
 
==Situation Description==
Starting 22:40 on 25 September 2013, our central plexus router experienced a hardware failure, this failure was impacting to the non-volatile storage of the system where the operating system and configuration are stored.
+
Starting on 8 July 2013, there was a power outage that affected multiple blocks of the city, including our facility as discribed by our electrical utility:
 +
 
 +
<blockquote>
 +
'''The Power for this outage has been restored on'''<br />
 +
'''Monday July 8, 2013 at 9:17 PM'''<br />
 +
<br />
 +
Original Power Outage Date: Monday July 8, 2013<br />
 +
Time: 7:13 PM<br />
 +
<br />
 +
Horizon Utilities reports there is presently a limited power outage in the Downtown area of Hamilton affecting 1607 customers.<br />
 +
<br />
 +
The cause of the outage is: an underground distribution problem<br />
 +
<br />
 +
Horizon Utilities crews have been dispatched to make repairs. The estimated time for power restoration is 10:00 PM. Updates will be posted periodically.
 +
</blockquote>
  
 
==Impact==
 
==Impact==
This incident caused our network to become unavailable to the public.
+
The incident caused our network to become unavailable to the widespread Internet. All primary, secondary and tertiary systems were powered down, systems on backup power also needed power down.
  
 
==Updates==
 
==Updates==
  
===23:30, 25 September 2013===
+
===21:17, 8 July 2013===
Manual failover to a network switch capable of rudimentary routing was completed. Services temporarily operational.
+
Facility primary power was restored.
 +
 
 +
===21:18, 8 July 2013===
 +
Work begun to cold-start primary systems
 +
 
 +
===21:30, 8 July 2013===
 +
Main computer core (X9CC-ECS) cold start power-on completed, system ACP (Application Compute Processor) was operational through outage on its own backup power for data integrity, that processor did not need cold-start procedures, No data loss had ocurred.
 +
 
 +
===21:31, 8 July 2013===
 +
Central plexus cold-start power-on completed.
 +
 
 +
===21:32, 8 July 2013===
 +
Mastercontrol powered and operational.
 +
 
 +
===21:40, 8 July 2013===
 +
Main computer core (X9CC-ECS) bootup procedures completed, system operational
  
===04:30, 25 September 2013===
+
===21:50, 8 July 2013===
Plexus core router was put in place again.
+
Unimatrix One declared online and operational
  
===05:00, 25 September 2013===
+
===22:00, 8 July 2013===
Restoration was completed approximately 2013 09 26 05:00.
+
Secondary systems powered.
  
===07:30, 25 September 2013===
+
===22:10, 8 July 2013===
Services operational. Traffic verified flowing.
+
Tertiary systems powered.
  
48 hour monitoring commences to end 2013 09 28 07:30
+
===22:10, 8 July 2013===
 +
Situation concluded.

Latest revision as of 03:15, 28 September 2013

OPERATIONAL STATUS BULLETIN: 20130708-01

Issued: Kradorex Xeron (talk) 23:44, 8 July 2013 (EDT)

In Regards To: Extra-facility Power Outage

Facility: Unicomplex One (Hamilton, ON, Canada)

Affected: *.digibase.ca (all systems, all services)

Ticket #: No internal ticket was issued

Expected Duration: Unknown

Status: Event started at 19:13, ended at 21:17. Outage was estimated 2-3 hours.

Situation Description

Starting on 8 July 2013, there was a power outage that affected multiple blocks of the city, including our facility as discribed by our electrical utility:

The Power for this outage has been restored on
Monday July 8, 2013 at 9:17 PM

Original Power Outage Date: Monday July 8, 2013
Time: 7:13 PM

Horizon Utilities reports there is presently a limited power outage in the Downtown area of Hamilton affecting 1607 customers.

The cause of the outage is: an underground distribution problem

Horizon Utilities crews have been dispatched to make repairs. The estimated time for power restoration is 10:00 PM. Updates will be posted periodically.

Impact

The incident caused our network to become unavailable to the widespread Internet. All primary, secondary and tertiary systems were powered down, systems on backup power also needed power down.

Updates

21:17, 8 July 2013

Facility primary power was restored.

21:18, 8 July 2013

Work begun to cold-start primary systems

21:30, 8 July 2013

Main computer core (X9CC-ECS) cold start power-on completed, system ACP (Application Compute Processor) was operational through outage on its own backup power for data integrity, that processor did not need cold-start procedures, No data loss had ocurred.

21:31, 8 July 2013

Central plexus cold-start power-on completed.

21:32, 8 July 2013

Mastercontrol powered and operational.

21:40, 8 July 2013

Main computer core (X9CC-ECS) bootup procedures completed, system operational

21:50, 8 July 2013

Unimatrix One declared online and operational

22:00, 8 July 2013

Secondary systems powered.

22:10, 8 July 2013

Tertiary systems powered.

22:10, 8 July 2013

Situation concluded.