Remember that not all outages are created equal: Timing and duration have a significant impact on the costs of downtime. In the original example, the outage was perfectly timed to impact the largest number of potential customers and thus have the largest business impact. What if this outage occurred at 3 a.m. ET instead of noon ET? Or what if it happened on a different day? Or, what if, instead of the Website being down for 4 hours straight on a single day, it was down for 30 minutes on eight different days? Shorter duration outages tend to be less disruptive than longer ones. All of this must be taken into account when calculating the impact of an outage.
Don't try and tackle the entire infrastructure all at once; break down your calculations on a service-by-service basis, starting with the most critical business services. Understanding the costs of downtime will guide the appropriate level of investment in downtime prevention for these services.
Step 2: Focus Availability on the End-to-End Service, Not on Infrastructure Components
Many companies rigorously track server uptime and storage uptime, but few succeed in tracking a single service's uptime end to end, meaning from every infrastructure and software component that works together to deliver a single service. This, however, is the single most important thing that an IT department can track because it is the metric that gets closest to the actual customer experience. This is critical in "the age of the customer" where businesses compete and differentiate themselves on the experience of IT-enabled business processes and transactions more than ever.
Step 3: Match Business Objectives to the Right Mix of Technologies
Once you've calculated your cost of downtime and shifted your focus to end-to-end availability, the next step is to select the right technologies to support your critical services. While there are many technologies that can support the always-on, always available extended enterprise -- such as active-active architectures, rapid virtual machine rebooting, application and service monitoring solutions, or cloud-based services, the difficult part is finding an approach that simultaneously supports your availability objectives and also matches what the business is willing to pay to protect critical service. Many enterprises find it useful to group services or applications into tiers of criticality and assign standard recovery time objectives (RTOs) and recovery point objectives (RPOs) as well as service-level agreements (SLAs) for availability. Organizations can then map appropriate technologies to the tiers of criticality using the business requirement.
100 Percent Uptime Is Virtually Impossible
In the end, the goal of the always-on, always-available enterprise is not 100 percent uptime; rather it is 100 percent service continuity for your most critical services. While there are many companies that have gotten very close, sustaining true 100 percent uptime for any extended period of time is virtually impossible -- there are too many things that can go wrong, from the infrastructure to the applications to natural disasters, human error, or even planned maintenance.
Sign up for Computerworld eNewsletters.