ITIL Event Management – The Launching Pad for a Successful Service Operation Strategy
Within the context of ITIL, Event Management is the set of practices that monitor the IT infrastructure and allows for normal operation. It also serves as the starting point for the execution of many service operation processes and activities.
ITIL (formerly an acronym for ‘Information Technology Infrastructure Library’ but now a standalone term) provides a detailed set of best practices to facilitate IT Service Management.
Along with Incident and Problem Management, Event Management is one of the core components of a successful ITIL Service Operations strategy. In a recent blog we discussed the differences between these areas. To recap:
- Event Management looks for things that are going awry
- Problem Management works on understanding root causes and avoiding issues from recurring
- Incident Management is all about restoring the service once normalcy is compromised
Definition of an Event
So, what is an ‘event’ anyway? An event is essentially a notification created by a service, configuration item or management tool.
Monitoring is the key to effective Event Management and it is worth understanding the difference between the two. Monitoring is focused on keeping an eye on the state of a configuration item or service and generates notifications accordingly. Event Management, on the other hand, is the set of processes and activities that then triggers what happens next.
As you know, there are mountains of notifications generated by monitoring tools. To not get overwhelmed, it is absolutely necessary to have an automated approach to manage the situation.
The Event Management Lifecycle
Automated ITIL Event Management systems start by filtering out the noise. Then they analyze the remaining events to help determine which events have a genuine service impact or point towards an impending degradation of service.
Once this is accomplished, the next stage is classification. Events are typically classified based on their significance or service impact:
- An Informational event does not require any action. This type of event is recorded in log files and stored for a predefined period of time. They may be used by other systems, such as analytics programs, to gain historical insights.
- A Warning event is generated when a device or service is approaching a predetermined threshold. This type of event can then be used to notify the appropriate IT team or to trigger a process/tool to prevent an exception occurring.
- An Exception event means there is a problem that affects the delivery of a business capability, degradation of service or some other loss of functionality. This typically results in an incident being logged that then needs to be resolved.
Once an event has triggered the appropriate response, the lifecycle continues to the Closure phase. Informational events are mostly logged as such, but warning and exceptions call for further intervention, investigation and action ahead of closure. These actions could be to notify the appropriate team to respond or to initiate automations to execute specific actions.
Beyond Incident and Problem Management
ITIL best practices call for Event Management to serve as the trigger that kicks off the Incident Management process while also playing a major role in Problem Management. The Incident Management function aims to restore service as soon as possible. The Problem Management cycle focuses on determining why an incident happened in the first place – and to keep it from recurring in the future. But the interaction with Incident and Problem Management is just the beginning.
The integration between Event Management and the other ITIL processes plays an important role in the effective delivery of systems and services in the overall IT Operations strategy:
- When it comes to the Capacity and Availability processes, Event Management can be tuned to signal status changes and exceptions to enable these processes to proactively determine appropriate response actions before KPIs are compromised.
- Configuration Management uses events to determine the current status of devices and services and to react if any unauthorized change activity is taking place in the infrastructure.
- Asset Management can use events to determine lifecycle status. Events can confirm that a newly added device or service has been properly configured and is now operating satisfactorily.
The core value of Event Management lies in facilitating the early detection and even prediction of incidents and, of course, with Problem Management. However, it also serves as the kickoff point for automations and as the entry point for several service delivery activities. When effectively integrated with other processes, it brings about efficiencies that allow the overall business to benefit by providing superior service operations capabilities.
ITIL Event Management Opportunities
Multiple monitoring tools are often used to manage across technology domains, which results in a fragmented view of the overall picture and inhibits collaboration across your teams. The performance of your Event Management system is directly tied to the manner in which you handle your monitoring tools. The challenges in this area are manifested by having to deal with the huge amount of notifications generated by them.
Today’s most advanced IT operations platforms make event analysis better as they can manage large amounts of data and provide a consistent view. The insights driven by these tools have the ability to improve your Event Management capabilities. In the near term, AIOps is poised to draw even more value out of large volumes of data by leveraging it to ensure availability and performance through predictive analysis, better root cause determination, and automation.
Ask your platform provider how they plan to leverage AIOps to improve your business outcomes. Make sure they recognize that it has the capability to provide dynamic thresholds, enhance predictive analysis, and provide advanced automation for deeper troubleshooting and root cause analysis.