Training, Post-Failure Procedures Should Inform O&M Manuals

  February 4, 2015




In developing site-specific O&M manuals, the important issues include training and post-failure procedures. Here is a closer look at both.

1. Comprehensive training mitigates risk during a failure: Use EOPs as the basis for equipment-specific operator training before occupancy. Initial training should be completed during construction commissioning, videotaped, witnessed, and verified by a third party (usually the commissioning agent).

Don't stop there: Continuous training helps mitigate 75 percent of failures attributable to human error. Drills, continuous operator training, and simulations within a live operational environment are critical to ensuring that operators will respond correctly in the event of a failure.

One global financial institution conducts an annual black-start drill in every one of its data centers, with detailed test scripts, training, monitoring, and restart back-up support.

Understandably, not every owner has the appetite for the operational risks involved in this process. An alternative for the risk-averse is the use of computerized simulation programs to train operators on EOPs for the electrical systems and BAS.

2. Understand and include post-failure work process authorization: Although many failures can be avoided using well-documented standard operating procedures (SOPs), maintenance operating procedures (MOPs), and EOPs, coupled with continuous operator training, the reality is that no data center is immune to failure. Operations staff should be aware that restoration of normal operations after an emergency requires a rigorous work-approval process and scheduling, which typically exceeds that for routine work orders.

In fact, most critical facilities are required to remain on the back-up system while operations engineers conduct an investigation to identify the nature of failure, ascertain its root cause, and develop mitigation and repair plans and schedules. Post-failure work process authorization typically occurs at a much higher organizational level and receives more scrutiny than a normal repair; for example, a critical facility may remain on emergency generator power for several weeks before the root cause of the failure is determined and work authorization process is complete.

Next


Read next on FacilitiesNet