Why Automated Root Cause Analysis is Critical
Peter is a 54-year-old man. He arrives at the hospital emergency department in the middle of the night, complaining of a pain in the center of his stomach and severe nausea. After waiting several hours, he’s examined by a doctor. The doctor takes Peter’s temperature – which is normal – and feels his stomach, which is slightly tender. The doctor decides that Peter has a dose of gastroenteritis, and sends him home with some medication for nausea.
Fast-forward 24 hours, and Peter’s back at the hospital again. This time, he had to be brought in by ambulance. He now has all of the classic symptoms of appendicitis – severe right lower quadrant pain, vomiting, fever and a swollen abdomen. Worse still, his appendix has burst, causing peritonitis – a potentially life-threatening condition that can lead to widespread sepsis, a massive drop in blood pressure, organ failure and ultimately death.
If only the doctor had made the right diagnosis when Peter first showed up at the hospital.
How Much Do IT Emergencies Cost You?
Just like in medicine, it’s incredibly important to get to the root cause of IT issues. In IT, it’s not usually a matter of life or death, but the consequences of getting it wrong can be devastating. According to Gartner, the average cost of unplanned IT downtime is $5,600 per minute or well over $300,000 per hour.
And this makes complete sense. For example, if your company has 5,000 employees, a single hour’s worth of lost productivity can easily cost $300,000 – that’s only $60 per employee. Similarly, if you’re an online retailer, think what would happen if your e-commerce site went down on Black Friday. And it’s not just about the immediate losses – the long-term repercussions can be equally damaging, both to your company’s reputation and to your revenues. When customers find that your site has gone dark or don’t get the support they need from your contact center, there’s a good chance they’ll go to your competitor next time.
Finding the Root Cause Is Hard
However, here’s the problem. Getting to the real root cause the first time isn’t easy. When a business service goes down – whether that’s your e-commerce portal, contact center or warehouse management system – it’s not likely to be a simple failure. Modern IT service delivery infrastructure is complicated – dozens or even hundreds of IT components work together to deliver a single business service. Is it a storage problem, a network connectivity failure, or a database performance issue – or has someone with sticky fingers misconfigured an application?
And there’s another huge concern: most IT experts are actually domain experts. They understand a specific IT technology – such as databases or networks – but they don’t have that end-to-end service view. Others may understand the overall picture, but they don’t have the depth to get to the root cause quickly and accurately. No one’s to blame for this – today’s IT technology is just too complex for any one person to understand completely.
What’s the result? Rather than getting right to the real root cause when a business service fails, these domain experts spend time pointing fingers in the war room. Each has their own opinion about what’s wrong – Joe thinks it’s an issue with the web server, Angela thinks it’s the hypervisor and Vikram suspects that it’s an intermittent hardware failure. They try one solution and then another – and meanwhile the clock is ticking
Wouldn’t it be better if they could automatically get to the root cause of the problem – and fix it? Surely, with modern AIOps technology, there must be a way of pinpointing what’s wrong. What if an AI system could correlate all of the symptoms, carry out targeted tests and come up with the answer?
Automatically Pinpoint the Accurate Root Cause in Seconds
Here’s the good news: automated root cause analysis exists today – and it’s already helping forward-looking IT organizations to quickly and accurately resolve service outages. These types of systems understand a wide range of IT technologies – and know how these technologies work together to deliver a specific business service. They collect and correlate monitoring data, log records, events and other information – using machine learning to trace issues across your service delivery infrastructure, rapidly and accurately determining what’s really wrong.
So, just how good are automated root cause analysis systems? Our own experience at Optanix has shown that automated root cause analysis identify the root cause of business service issues in 30 seconds or less. What’s more, they are astonishingly accurate – delivering a first-time fix rate of more than 90%. The result is that outage times are typically cut in half – instead of wasting time trying to find the issue, you can start to fix it right away. That can shave 6 hours or more off the restoration time – reducing the typical business impact by almost 2 million dollars.
Automated Root Cause Analysis Fix Issues Before They Become Disaster
And there’s a further key benefit of automated root cause analysis. Think about Peter again. If the doctor had diagnosed his appendicitis the first time, he could have dramatically reduced the severity of his illness. It’s the same with IT issues. The goal is to identify and resolve issues before they become worse and have a major business impact. Automated root cause analysis can spot the early symptoms of emerging problems, giving you time to take steps to address them before your employees and customers are affected. At Optanix, we’ve seen automated root cause analysis application’s proactive fix rate as high as 95%.
Here’s the bottom line. Diagnosing service outages manually takes a huge amount of time and effort – and it’s usually not accurate. Every hour spent trying to find the root cause is an hour wasted. And this delay can cost your business millions of dollars for a single outage. By using automated, intelligent technology, you can reduce this time from hours to seconds.
Trust me, Peter and his appendix will thank you.