Availability Concepts for Networks and Systems

Network and System High Availability Levels
Network and System High Availability Levels. Bradley Mitchell

In computer hardware and software, availability refers to the overall "uptime" of the system (or specific features of the system). For example, a personal computer may be deemed "available" for use if its operating system is booted and running.

While related to availability, the concept of reliability means something different. Reliability refers to the general likelihood of a failure occurring in a running system.

A perfectly reliable system will also enjoy 100% availability, but when failures do occur, availability can be affected in different ways depending on the nature of the problem.

Serviceability affects availability as well. In a serviceable system, failures can be detected and repaired more quickly than in an unserviceable system, meaning less downtime per incident on average.

Availability Levels

The standard way to define levels or classes of availability in a computer network system is a "scale of nines." For example, 99% uptime translates to two nines of availability, 99.9% uptime to three nines, and so on. The table shown on this page illustrates the meaning of this scale. It expresses each level in terms of the maximum amount of downtime per (nonleap) year that could be tolerated to meet the uptime requirement. It also lists a few examples of the type of systems being built that commonly meet these requirements.

When talking about availability levels, note that the overall time frame involved (weeks, months, years, etc.) should be specified to give the strongest meaning. A product that achieves 99.9% uptime over a period of one or more years has proven itself to a much greater degree than one whose availability has only been measured for a few weeks.

Network Availability: An Example

Availability has always been an important characteristic of systems but becomes an even more critical and complex issue on networks. By their nature, network services are commonly distributed across several computers and can depend on various other auxiliary devices as well.

Take the Domain Name System (DNS), for example -- used on the Internet and many private intranet networks to maintain a list of computer names based on their network addresses. DNS keeps its index of names and addresses on a server called the primary DNS server. When only a single DNS server is configured, a server crash takes down all DNS capability on that network. DNS, however, offers support for distributed servers. Besides the primary server, an administrator can also install secondary and tertiary  DNS servers on the network. Now, a failure in any one of the three systems is much less likely to cause a complete loss of DNS service.

Server crashes aside, other types of network outages also affect DNS availability. Link failures, for example, can effectively take down DNS by making it impossible for clients to communicate with a DNS server. It's not uncommon in these scenarios for some people (depending on their physical location on the network) to lose DNS access but others to remain unaffected.

Configuring multiple DNS servers also helps to deal with these indirect failures that can impact availability.

Perceived Availability and High Availability

Outages are not all created equal: The timing of failures also plays a big role in the perceived availability of a network. A business system that suffers frequent weekend outages, for example, may show relatively low availability numbers, but this downtime may not even be noticed by the regular workforce.The networking industry uses the term high availability to refer to systems and technologies specially-engineered for reliability, availability, and serviceability.

Such systems typically include redundant hardware (e.g., disks and power supplies) and intelligent software (e.g., load balancing and fail-over functionality). The difficulty in achieving high availability increases dramatically at the four- and five-nines levels, so vendors can charge a cost premium for these features.