Redundancy Provides the Key to Data Center Reliability
Jun 1, 2000 12:00 PM, By Joseph R. Knisley, Senior Editorial Consultant
Since information technology is becoming the crude oil of the 21st century, its continuous availability is vital to a Georgia data processing center's operation.
Continuous, clean, and uninterrupted power is the lifeblood of any data center, especially Total System Services, Inc. in Columbus, Ga. At this 100,000-sq-ft data processing facility, which processes credit card information for major corporations round the clock, you'll find 100% redundancy at every level! This includes the utility substation all the way down to the Power Distribution Units (PDUs) in the computer room. How do they ensure reliable power? Multiple interconnecting power pathways meet the center's need for high nines of reliability.
Providing continuous operation under all foreseeable circumstances, such as power outages, equipment breakdown, internal fires, etc., the building's design uses the most modern techniques to enhance reliability. These include redundant systems and components, a standby power generation system and UPS systems, fire detection and suppression systems, moisture detection systems, lightning protection, and central monitoring of major systems.
Generation. The standby generation system consists of four 1250kW diesel generators, providing a total capacity of 5000kW. The design calls for at least one redundant generator to be online whenever the building is on local generation. Each engine generator is dual rated: a higher rating for standby operation and a lower rating for extended operation as a prime power plant.
The generator control system, designed for up to four generating units without modification, provides either completely automatic or manual control. In emergency mode, upon loss of utility power, every available generator starts and independently synchronizes to the bus. As units connect to the bus, loads are added in order of priority.
To avoid overloading the generation system, feeders are added to the bus only if the actual number of online generators can sustain the additional load, which the units share equally. When the utility source is restored, the generating system synchronizes with the utility, smoothly transfers the load back to the utility, and after a time delay, the generators drop off the system.
In the interruptible mode, the entire building load is removed from the system. Using a seamless closed-transition transfer system, the load shifts to the generators without interruption of power and with no perceptible voltage disturbance. This design enables Total System to separate from the utility whenever management anticipates a potential utility problem, such as weather-related events.
To oversee these functions, this arrangement features a master control cubicle and four generator-control cubicles located in the physical plant control room. The master control panel/switchgear array contains a microprocessor to provide the logic for automatic operation.
A full-color, touch screen display, which looks like the meters and gauges on conventional equipment, gives the operator an instantaneous, understandable view of the entire system status.
If the touch screen controller fails, the automation processor remains in operation, keeping the system running. In the unlikely event that both the touch screen and automation processor are simultaneously out of service, the system invokes a backup processor. Finally, workers can use manual control at the discretion of the plant operator.
UPS system. The uninterruptible power supply (UPS) system, located near the computers, operates at 480V. The UPS system consists of two separate switchboards, each fed from a separate utility transformer and a separate generator plant. In turn, each switchboard feeds up to five 500kVA modules. One module is redundant so a module failure will not affect the overall capacity of the system.
The output of each UPS switchboard connects to a "hot tie" switchboard. Under normal operations, the output of each UPS goes through a separate path in the "hot tie" switchboard and feeds the UPS distribution switchboard, which feeds the PDUs and critical loads. The critical loads come with dual cord technology, so each critical load feeds from both UPS systems. Thus, a single failure in a device, cord, PDU, or UPS module has no effect on the critical load.
If a catastrophic failure occurs on either UPS system, the "hot tie" switchboard transfers the load from one UPS to the other. This transfer can also take place manually, allowing scheduled maintenance and testing without interfering with data processing operations. This feature also allows the addition and replacement of modules without any consequences.
A generator-served maintenance bypass allows one generator bus to directly serve each critical-load bus, while maintenance services and tests the UPS system. The generator system's closed transition feature allows it to lift the critical load transparently from the UPS, serve the load through the maintenance period, and return it to UPS service—all without interrupting power to the critical load.
Data centers require almost continuous uptime and are completely intolerant of unscheduled downtown. Providing the best equipment is not enough to ensure 24-hr operation throughout the year. This installation shows one of the ways to overcome the inherent limitations of equipment reliability through a redundant design. Each data center requires a unique design to achieve its mission within defined project constraints.
• A/E/C Management Firm: McClier, Atlanta; Fernando Orti, P.E.; lead electrical designer;
• Architect: Hecht, Burdeshaw, Johnson, Kidd & Clark; Tim Jensen, lead architect, Columbus, Ga.;
• Electrical Contractor: Alexander Electric, Columbus, Ga.;
• Generator Control Equipment: I&S Operations, Inc.; Alpharetta, Ga.;
• Owner's Representative: Chip Torbert;
• Electric Utility: Georgia Power Co., Atlanta.
Sidebar: What Does "High Nines" of Reliability Mean?
Traditionally, you achieve reliable power with an electrical system design that uses the local utility AC power grid, standby diesel generators, and an uninterruptible power supply (UPS). Operating as part of a high reliability system, a typical UPS system offers 99.9% reliability, or three nines. Adding more redundant features can boost a UPS system up to four nines of reliability.
While this may sound like an exceedingly high degree of reliability for many firms, three nines of reliability translates into 53 minutes of downtime per year.
Sidebar: What Is Distributed Redundancy?
The power system for Total System Services data center offers 99.9999%, or six nines of reliability, which translates into an unscheduled downtime of only 3 sec to 30 sec per year.
By using what we call distributed redundancy, you can maintain a high level of power reliability. This design allows ease of equipment maintenance, and it can fit within a typical investment/reliability ratio. According to a survey (by the Uptime Institute) of large data-processing center downtime, 79% of electrical infrastructure failures that interrupted critical load operations occurred between the UPS output bus and the critical load. The emphasis of critical-power system designs needs to shift from designing a "bulletproof" UPS system to creating a fault-tolerant UPS system. This transforms the importance of power maintenance from the output of the UPS to the input terminals of the load equipment.
Distributed redundancy means creating dual, full-capacity UPS-system buses and redundant power-distributed systems. This eliminates as many single points of failure as possible all the way up to the load equipment's input terminal. To provide fault tolerance, you must have some method of allowing the load equipment to receive power from UPS power buses. Protecting against fast power system failures, such as circuit-breaker trips or a power system fault, requires a fast switching method, such as a static transfer switch, which creates a fast break-before-make transfer between two AC power sources.
The two AC power sources should be as independent as possible to eliminate common failures. Switching between the two power sources needs to be break-before-make, for the same reason.
A number of distributed-redundancy power configurations are possible; however, that redundancy should be as close to the load as possible to achieve its goal. This keeps power available at the load-equipment level. The ultimate distributed redundancy configuration would feature two independent UPS power-distribution systems with dual-input load equipment, as redundant AC power flows up to and inside the load equipment.