AIOps Use Cases for Enterprise IT Operations
AIOps vendors and platforms depend on the use of machine learning techniques, data aggregation, analytics and automations to provide real-world benefits. These benefits are primarily realized in two key areas:
- The ability to derive insights into the state of the infrastructure
- The ability to act upon these insights rapidly to provide truly proactive ITOM capabilities
In this blog, we’ll look at a real-world AIOps use case that is already providing these benefits in an enterprise IT operations setting.
SDDCs Are Now Viable
It is now more necessary than ever to implement an approach to IT management that can react to the faster rate at which services are spun up, while also keeping pace with the constantly morphing nature of the service delivery infrastructure.
With the advent of virtualization and business logic overlays, software-defined data centers (SDDCs) are becoming increasingly viable. This is because they provide increased efficiencies by matching the business requirements of a particular service to the resources needed to provision that service and to provide resiliency.
SDDCs can rapidly scale up or down as requests arise. They also provide elasticity to the infrastructure that supports business services. All this adds up to a more dynamic infrastructure in terms of the computing, storage and network components deployed at any particular point in time.
Traditional ITOM practices are more infrastructure-centric, so it takes a different mindset to properly manage the SDDC paradigm. Since the business needs drive the infrastructure, SDDC requires a service-centric approach.
This is where AIOps solutions excel. Take, for example, the following before-and-after AIOps use case scenario: a global airline that promotes a one-day ticket price sale.
A Case of Overload – An AIOps Case Study
This AIOps case study follows an airline promotion that was scheduled well in advance of its launch. Plans were made to cope with the anticipated increase in the resources needed – phone connections, web server space and so on – to ensure callers would have a satisfactory customer experience.
But the best laid plans often go awry. On the day of the event, a huge storm system affected air traffic. Delays and cancellations began to accumulate. This resulted in an unexpected spike in web and phone traffic. Customers made repeated status calls and inquiries and began searching for alternate flight bookings.
Since the sales promotion was running in parallel, the system was taxed. It wasn’t too long before the demands imposed on the infrastructure begin to overwhelm capacity and service performance began to degrade.
The issues were not recognized fast enough, thus action was not taken promptly enough. The result was a frustrating customer experience for both travelers and promotion respondents.
As online services were impacted, the workload shifted to employees in the terminals who had to work overtime to rebook thousands of customers. Tragically, all of the marketing effort around the promotion was wasted. Overall, the whole situation presented a negative outcome for the business.
Below, we’ll dig into this AIOps use case to show exactly how AIOps is impacting the IT industry.
The airline was using services that were state-of-the-art at the time.
Every ITOM platform provides some level of capability to gauge threshold parameters. Essentially, this consists of monitoring systems that generate alerts when there are deviations from baseline performance. They may also have the ability to provide time-to-threshold reports. This is data that provides updates on how quickly things are approaching preset alarm levels.
This data is then shared via a reporting feed – or emails and texts – to let the right person (or team) know so they can take prompt action. For obvious reasons, such an approach works better in an environment where it is possible to determine what constitutes “typical” behavior.
In the airline scenario, the IT operations platform raised an alert as capacity started maxing out. But it wasn’t able to overlay what the implications were to the overall business services delivered – the promotion and the travel rebooking. It triggered a workflow that was slow and required intervention to respond to the situation caused by the storm.
After the AIOps Solution
Today, AIOps solutions enable airlines to do things differently and more efficiently. They run promotions year-round, regardless of the weather. Let’s look at the approach that improved the outcome:
Airlines now have the ability to provide contextual insights. They leverage similar time-to-threshold analysis, but the report information is based on the correlations to business services and performance data.
Infrastructure events and correlated performance issues are immediately routed to the IT operations team as a high priority. This ranking is calculated from the potential revenue impact of reduced availability of the ticketing system due to performance problems or outages.
This gives the team the ability to kick off orchestration (directly or via APIs) to modify commit rates for the needed bandwidth, spin up servers to add to the load balancing pool, and to provision resources to offset the database workload. Some examples:
- Internet bandwidth on a specific path will max out at the current rate in X minutes
- Database resources based on CPU, disk and memory will fail in Y minutes
- Load balancer thresholds determine that, based on the current traffic requests, servers need to be added to the web pool
And coupled with airlines’ SDDC capabilities, these actions are automated. As noted earlier, SDDCs require a service-centric approach. This results in a completely different outcome. Steps are taken to avoid service degradation before business service is impacted. This is just one AIOps use case that illustrates how it can drive a truly proactive approach to ITOM.
The airlines’ AIOps platforms now enable the IT operations team to address evolving situations before they get out of hand.