When Business Services Fail

Back last October, a massive cyberattack made headlines around the world. Dyn, one of the Internet’s premier DNS service providers, experienced an overwhelming DDOS attack, resulting in some of the world’s best-known websites going offline. Twitter, Netflix, PayPal, Spotify and many others fell silent – first on the east coast of North America, and then throughout the entire continent and into Europe. In all, the attack came in three waves, creating widespread disruption for more than 11 hours.

This type of high-profile cyberattack ignites public interest – and so it should, since it highlights the fundamental vulnerability of the Internet. However, hacking is only one component of the service availability problem – in fact, when business services fail, it’s often due to misconfiguration or equipment failure. That’s the hidden story – the huge impact of service outages that have absolutely nothing to do with the dark underbelly of the Internet.

So, just how bad is the problem? According to a 2016 study by IHS Markit, service downtime costs businesses $700 billion a year – and that’s just in North America. Midsized and large enterprises experience an average of 5 outages and 27 hours of downtime every month – that’s more than 300 hours of downtime every year. And cyberattacks are only responsible for about 10% of these outages. 40% are due to equipment issues, and 25% are due to service provider problems and human error – sticky fingers, in other words. Compared to these figures, the highly publicized Dyn outage was relatively small.

Let’s put that $700 billion number in perspective. The total US GDP in 2015 was approximately $18 trillion, so $700 billion represents a staggering 3.9% of America’s total economic output. By comparison, US federal defense expenditures are approximately $590 billion a year, and total US Medicare spending is slightly more than $640 billion annually. In other words, IT service outages cost the US more than protecting the nation or caring for the health of its senior citizens.

Why Are Downtime Costs So High?

The answer is more complicated than you might think.

Perhaps the most obvious answer is lost productivity. For example, consider a critical service outage that affects 1,000 employees for just five hours. And let’s be conservative and assume that the outage only results in a 50% reduction in productivity – each employee loses 2.5 hours of productive work due to the outage. That’s still 2,500 hours lost – or around $125,000, assuming a wage of around $50,000 a year. Of course, your mileage may vary depending on your industry, but that’s still $25,000 an hour on average.

And that’s just the start. If the outage is in a customer-facing system, then immediate revenues are at risk. Again, let’s look at an example. Assume that your company does 50% of its business either online or through a contact center. Now, a five-hour outage during peak times – say between 4 PM and 9 PM – could well impact 50% of your revenues for the day. If your turnover is $1 million a day, that’s $500,000 in lost or deferred revenues – or $100,000 an hour. And that number gets bigger the larger you are – for example, when Amazon went down for 30 minutes in August 2013, some estimates put the cost at more than $66,000 a minute. Coincidentally, Google went down for five minutes just a few days earlier, costing the company $545,000 – or more than $100,000 a minute – while reducing total Internet traffic by 40%.

Don’t Forget the Long-Term Costs

Immediate losses are only part of the picture. There are also long-term implications for service outages. For example, think about customer loyalty – and what that means for repeat sales. The financial impact of this can vary widely from industry to industry, but it can be enormous. For example, when Salesforce experienced an outage in May, 2016, it sparked headlines such as “Salesforce Outage: Can Customers Trust the Cloud?” That’s not the sort of publicity that any company wants – especially if its business depends on customer trust in its technology platform. At the time, Gartner VP and Fellow Yefim Natis commented that, “This happening once will be forgiven. However, if this Salesforce outage is an indication of things to come for Salesforce’s systems, customers may start looking for other solutions.”

And the list goes on. Think about paying financial damages – for instance, the $20 million that reservations management company Navitaire had to pay Virgin Blue (now Virgin Australia) when the airline’s reservation system melted down. Or the impact of outages on employee morale and retention. Or compliance and reporting penalties. Or even the IT cost of just fixing outages. Or … well, you get the idea.

How Do You Tackle the Problem?

The answer is seductively simple. Prevent service outages wherever possible – and fix them more quickly when they happen. However, that’s a bit like saying you can live to a ripe old age if you don’t die. Exactly how do you avoid dying? Or, in this case, how do you reduce service outages and resolve them faster?

Let’s start with preventing service outages. There are three main ways to do this:

  1. First, architect your business services to deliver the right level of redundancy and reliability. This guards against equipment failures and network disruption.
  2. Second, implement effective change management processes. This dramatically reduces the likelihood of human error.
  3. Third, there are intelligent technologies that can proactively detect and diagnose issues before they cause service outages – in many cases, these platforms increase proactive detection by up to 300%.

And the same intelligent technology solutions also accelerate remediation of service outages. By automatically identifying the root cause of business service issues – often in as little as 5 seconds – these platforms eliminate huge amounts of time and effort – typically reducing service outage times by 50%. That’s an average of 150 hours of downtime saved for a typical enterprise every year – and much more for large enterprises, helping them avoid tens of millions of dollars in downtime costs.

And that’s an investment worth making.

Success stories


“The Optanix single unified platform replaced multiple point tools, reducing the TCO.”