The recent surge in Internet usage has been accompanied by an equally large demand for high-quality power to feed the evolving infrastructure. Internet power consumption is now growing by hundreds of megawatts per week—taxing the already stressed electrical grid. Requiring a continuous source of high-quality uninterrupted power, critical Internet infrastructures must rely on internal power quality protection systems. The caliber of these systems is often a defining factor for users who increasingly feel the ill effects of power-related problems in millions of dollars per incident.
As the Internet continues toward its ultimate destiny, its criticality and reliability will continue to take on new importance. Power quality is the number one issue affecting Internet reliability today. The critical power system requirements of typical Internet data centers are increasing almost exponentially, along with the consequences of a power interruption.
The engineering community now faces the task of selecting the correct power-system topology for data centers. By evaluating popular power-system topologies from the standpoint of expandability and reliability, we can determine the best configuration for today's modern Internet infrastructure.
Selecting a UPS Topology
Realizing that many critical facilities require more power and higher levels of quality and reliability, an uninterruptible power supply (UPS) is necessary to ensure a stable power environment. Even with a UPS system, it takes careful configuration planning to achieve the “zero downtime” or the “five to six nines” of reliability demanded by large critical facilities. The number one element that contributes to system reliability is redundancy. You can achieve redundancy with many different UPS configurations. While there are many different combinations of UPS configurations, this article focuses on the three most frequently selected options for large Internet data centers: isolated redundant, distributed redundant, and parallel.
Isolated Redundant Configuration
In an isolated redundant system, utility power must never feed the critical load, regardless of UPS shutdown or maintenance. In a traditional parallel system, the static switch cabinet resides between the critical load and redundant UPS modules. In an isolated redundant configuration, however, the static switch bypass is eliminated, and each primary UPS module feeds the critical load on an isolated bus. You accomplish redundancy by installing a redundant system that feeds the internal bypass static switch and maintenance bypass of the primary modules. The redundant UPS system is always fully operational, keeping its own battery plant charged and waiting to support the critical load. What's the key to an isolated redundant system design? Standby UPS must support a 100% step load with a minimal transient voltage.
This system satisfies the essential design elements in the following ways:
Availability. Should the primary module go offline, power is seamlessly transferred to the redundant system.
Maintainability. The primary and redundant modules are provided with a maintenance bypass, allowing complete isolation for maintenance while maintaining conditioned UPS power to the critical load bus.
Reliability. Each module's operation is independent, thus eliminating any system level controls contributing to unsurpassed reliability.
Distributed Redundant System
One of the most prevalent critical power-system configurations selected for use in today's Internet data centers is the distributed redundant configuration. The most attractive feature of this design is the ability to expand the capacity of the system without interrupting or compromising the flow of critical power to other loads. This configuration uses many groups or sectors of separated primary busses, with each primary bus designed to support a specific area of the data center. There also is a single redundant bus—typically rated at a kVA equivalent or larger than that of the largest primary bus. The output of each primary UPS module connects to the preferred input of the static transfer switch (STS). The alternate input of the STS connects to the output of the redundant UPS system. Under normal conditions, the primary bus feeds the critical load. If the STS detects any deviation in power quality, it will immediately transfer the load to the redundant bus.
While technicians at many data centers realize that an increase in power requirements is inevitable, management doesn't want to build a power system to its full capacity until the facility can use it. That's why they often select a configuration based on its ease of expandability. The distributed redundant system meets these requirements. Since the primary and redundant busses operate completely independently of each other, the system allows capacity addition on the primary busses with no risk of interruption to other parts of the system.
The key features of distributed redundant systems include:
Availability. The system improves availability by the STS. Shutdown of the primary UPS or any deviation in power quality will cause an uninterrupted transfer to the redundant system.
Maintainability. The system's availability allows complete isolation of the primary bus for maintenance by transferring the load via the STS to the redundant system.
Reliability. Similar to an isolated redundant system, each module operates independently of all other modules (primary and redundant), thus eliminating any system level controls and optimizing reliability.
Expandability. Since all busses are independent, you can add primary busses to expand the power demand without impacting the existing primary or redundant bus. This allows the system to expand on an as-needed basis.
Even the largest single UPS module cannot provide enough power for most Internet data center applications. Therefore, additional UPS modules are frequently added on a parallel bank to allow for module redundancy and capacity. In fact, a parallel redundant system (where you can take one or more modules on a parallel bank offline while the remainder supports the load) is still the most popular type of redundant configuration.
The most conventional way to parallel UPSs involves bussing the output with a single static bypass switch. Under normal operation, all the UPS outputs work in parallel to supply the required load. If you have to take the UPS offline, the system-level STS seamlessly does so by allowing the bypass power (utility or from a redundant UPS) to feed the critical load.
The reliability of the parallel configuration depends upon the integrity and redundancy of the parallel controls and whether or not the parallel bus has redundancy. In other words, one UPS allowed to drop out of the service must have enough power to service the load, and protected power from another UPS should feed the bypass. In many cases, a parallel redundant system will form the redundant bus of a distributed redundant system as well as the building block for many other configurations.
Determining Topology Reliability
At the end of the day, all that really matters is reliability. The term itself is probably the most loosely used word in any industry—simply because gauging the reliability of most systems is very subjective. To address the needs of Internet data centers, you must look at reliability in terms of downtime. If you ask these managers how much downtime is acceptable, the standard answer is “none.” Aiming for zero downtime is a noble target, but trying to ascertain the integrity of a power system design is not a straightforward task.
The first way to determine a system's reliability is by using empirical or demonstrated data available on a specific design. For example, if the UPS manufacturer has a number of identical existing systems operating in the field, and the system's reliability or performance has been well documented, then you can gauge the expected reliability of similar systems you plan on installing. Many manufacturers give a mean-time-between-failure (MTBF) figure in hours—based on the cumulated operating hours of all running units since the last failure of any unit. This calculation (known as a demonstrated MTBF) is very attractive, but it can produce unreliable data. Since the system's integrity depends upon the manufacturer acquiring accurate data on all field failures, any inaccurate reporting will skew the results. Plus, this method also favors the manufacturer with the largest install base. For these reasons, be cautious when viewing MTBF data calculated by the demonstrated operation of the equipment.
A more traditional method of calculating MTBF or equipment reliability is by using standard MTBF calculations, as defined by MIL-STD 217. This method is based on statistical failures of individual electronic components that make up the system. Like the demonstrated MTBF method, it is not a bulletproof way of predicting system reliability. This is because different variations in the methodology can reveal different answers. In addition, the accumulated data for any given component may be applicable to the specific components used in the system, which may be of higher or lower quality or for construction altogether. Table 1, on p. 40, indicates the reliability of various configurations derived from calculated MTBFs.
|UPS system configuration||MTBF in hours||Availability||Mean downtime in five years|
|Utility power||<4,000 hours||<99.9%||> 20 hours|
|Single module w/o static bypass||27,440 hours||99.97813%||9.58 hours|
|Single module w/static bypass||250,000 hours||99.99760%||1.05 hours|
|Shared parallel redundant (2 mod)||380,000 hours||99.99789%||0.92 hours|
|Isolated / Distributed redundant||498,000 hours||99.99919%||0.35 hours|
Another well-accepted method of assessing reliability is to view the configuration from a straightforward analytical approach. Table 2 lists the factors that improve or decrease power system reliability.
It is clear that utilities may not amply address the needs of the growing Internet infrastructure. They also may experience difficulty providing the quantity of power required by data centers—popularizing the concept of on-site power generation to sustain loads during peak-demand periods. These factors place the burden of power quality on the end-user. The distributed redundant configuration has the necessary reliability requirements, and it's among the easiest and safest systems to expand capacity under normal operating conditions. While other system configurations share a reliability level equivalent to the distributed redundant configuration, they do not offer the same ease of maintainability or expandability.
Table 2. Reliability determinants.
Factors That IMPROVE UPS Reliability
- Module redundancy
- Increased UPS bypass source reliability
- Individual battery systems
- Simplified operator interfaces and procedural safeguards
- IGBT inverter technology (due to lower parts count)
- Use of recognized and agency-listed standard components
Factors That DECREASE UPS Reliability
- Complicated switchgear systems
- Shared controls or single points of failure
- Common battery systems
- System complexity
- Poor environmental conditions
- UPS topologies with narrow input voltage and frequency windows or topologies
This article is based on a paper presented at Power Quality 2000.