AI and ML are already here. Many of today's integrated management systems are incorporating ever-increasing levels of machine learning (ML) and rudimentary artificial intelligence (AI) capabilities to build their automatic remediation workflow libraries and knowledge bases.
ITIL, the IT Infrastructure Library, has been a steadfast IT services management standard for decades. How do AI and ML jibe with ITIL?
ITIL is an integrated, codified, process-based framework for managing IT services. It was designed to help organizations better integrate and align IT with their business needs and strategies. Part of the original intent behind ITIL’s development was to make IT explainable – and therefor justifiable – to other parts of the business. It did this by providing structured methods and objective metrics for IT performance.
Because ITIL resembles other basic business process models, non-IT people have an easier time following along with otherwise arcane details. By providing data by which to judge things such as ticket resolution, development progress, incident response and infrastructure improvements, the ITIL framework can help give businesses metrics on which to base strategic decisions.
On the flip side, ITIL can also add significant process and personnel overhead and may create knowledge and operational silos. Overall, however, it is considered to be a highly effective philosophy for IT service management.
The Benefits of Machine Learning
IT professionals have always worked hard to improve efficiencies and effectiveness. They have enthusiastically adopted best practice processes and used available technology to enable workflow automation, knowledge management, remote control and business intelligence.
We are now entering a new phase, where we can cede some of the IT management effort to machines. ML will initially influence the automation of tasks, where algorithms can be used to understand patterns and context to decide the best course of action without human input. The results and benefits of include:
- Greater speed and efficiency: Machines are quicker than people and they can also work 24/7 without getting tired.
- Reduced costs: People costs are still a large part of the overall IT department budget. While technology isn’t necessarily cheap, the cost of ML-based automation is more than covered by its savings in people costs.
- Better people utilization: With ML solving mundane and repetitive tasks, time-constrained human brains are freed up to work on bigger issues, such as strategy and planning.
- Reduction in human error: People make mistakes. It doesn’t matter if they are inexperienced, rushing or tired – these mistakes might have an adverse impact on the business. ML-based automation has the potential to make far fewer mistakes.
- Better customer experience: Automation can yield speedier delivery of services, resulting in cost reductions. In turn, when things go wrong, the human-based elements of customer service and support have an ace in the hole – resulting in a better overall experience for the customer.
Aligning ML and ITIL
So how do we reap the benefits of machine learning in our ITSM environment while not abandoning the benefits delivered by the structure of ITIL?
Many of the best ITSM tools integrate nicely with ITIL protocols. There may be overlap between the individual modules of the toolsets and the ITIL concepts of Problem Management, Incident Management and Event Management, but the essential functionality fits nicely into the conceptual framework.
For example, many systems have a MOM (Manager of Managers) that provides a single console to manage alerts sent from multiple monitoring tools. This is a specific recommendation from ITIL Event Management. ITIL defines the benefits as:
- Reduction in the time to identify problems by looking in one place – as opposed to having many disparate dashboards, each representing one source for the issue.
- Reduction in confusion, by transforming events from different sources and various formats into one consistent format.
- Reduction in the time needed to resolve problems by managing alerts in one console.
What will be required for success is an implementation that squares up the following issues. The implementation must be able to:
- Reconcile the inherent challenges in sanity-checking and maintaining the necessary integrations and the automations themselves.
- Support "infrastructure as code," so these artifacts are treated as code and can therefore be maintained far more easily.
- Manage a potential loss of context on the part of the human operators that might lead to a loss of trust. Systems need to be transparent and proactively keep humans in the loop. For example, automation systems can interact with operators via chat as a peer. In addition, the workflows themselves must be capable of opening and updating tickets in the tracking system.
- Avert the risk of runaway automations or flapping. Any control system has to be able to control itself. Auto remediation systems must have the ability to limit responses to given sources of events. For example, it must ensure that it does not spawn a cycle of remediations that remediate remediations.
- Have the ability to scale to today’s environments. Prior IT management systems automated much less dynamic environments that were orders of magnitude smaller than today’s. Modern auto remediation needs to scale horizontally and typically incorporates a message queue and other techniques to achieve this scale.
A Vastly Smarter Management Environment
If the implementation can accommodate these, we can reap the benefits of a vastly smarter management environment – combined with the structure that the ITIL philosophy provides. In the near future, we can look forward to many new benefits:
- Identifying and predicting issues and problems, along with the most likely resolutions – which in turn will lead to predictive maintenance and less reactive fixing.
- Better understanding the risks of proposed changes.
- Predicting demand, which will lead to effective capacity planning.
- Greater access to known solutions. Workflow libraries will be pooled, and the automation layer will be smart enough to adapt – and update – the workflow to accommodate local conditions.
- Better knowledge management and documentation production. Machine learning can be employed to identify “missing” article gaps and to create new articles automatically from existing tickets (e.g., the already documented resolutions).