6 Tips to Help You Improve Incident Management

Constantly improving incident management will ensure you can deliver the best possible outcome with the lowest possible use of resources.

ITSM thought leader Stuart Rance offers tips to help improve your incident management, including:

Focusing on customer experience and business value, rather than process.
Making clear separation between your incidents and problems.
Encouraging staff to create knowledge articles that will help others to resolve incidents.
Reviewing your incident categories to be sure they’re all effective and necessary by your stakeholders.

Receive a free copy of our white paper

6 Tips to Help You Improve Incident Management

by Stuart Rance

Incident management is often the first IT service management (ITSM) practice that an IT organization adopts, and many of my clients have a well-established and mature incident management practice. This doesn’t mean that there’s no opportunity to improve though, as there are always things that could be done better, and opportunities to learn from experience. The best IITSM organizations are the ones that recognize that improvement never finishes.

Incident management can be costly: the service desk may employ many people and use expensive telephony and ITSM tools; second and third line support often include experienced (and expensive) staff; and vendor support contracts can be expensive. It can also be very difficult to demonstrate the value that the business gets from money spent on incident management. Even if incident management is perfect, every incident will still be seen by your customers as a failure on the part of IT. Achieving agreed targets for service restoration may limit the damage done by IT failures, but your customers would rather the IT hadn’t let them down in the first place. This combination of high cost and low perceived value provides some real challenges to the incident manager, and makes incident management an important area of focus for improvement efforts.

What Is the Purpose of Incident Management?

It’s important to identify the purpose of a practice such as incident management, as this can help to influence how you design and operate the process, configure and manage the tools, train and manage the people, and interact with suppliers.

ITIL® (the most widely used framework for ITSM) says that the purpose of incident management is to “minimize the negative impact of incidents by restoring normal service operation as quickly as possible.” This is true as far as it goes, but I think that taking some other things into account as well will help you to do a better job.

Here are a few ideas about what incident management is for, which I have found useful. You may want to adopt some of them, and maybe you can add some more of your own.

Once you have decided what your incident management process needs to focus on, you can review what you do, and how you measure and report it, so that you can make improvements. If you are not clear about what you are trying to achieve then it can be very hard to identify worthwhile improvements. Here are some tips to help you improve your incident management:

Tip 1 – Focus on Customer Experience and Business Value, Rather than Process

When I was just 16, I got a job in a call centre, taking phone calls from people whose gas appliances wouldn’t work. I was given strict instructions to ask each customer for their name, address, and phone number – which I entered into a form on a computer screen – and then tell the customer that a fitter would visit them within four days. It was important for me to handle each call quickly as there were many calls to take each day. On my first morning I took a call from a very angry customer who said that his factory boiler had blown up, the foreman was in the hospital, the factory was unusable and all the workers had been sent home. I took his details and then told him, as I had been taught, “a fitter will call within four days”. Shortly after the call, my manager came to see me and I got told off for doing exactly what I had been instructed!

When I look back on this story, I am shocked that I could have been so unaware not just of the business context, but also of the human and emotional significance of that phone call from the customer. This was a formative experience for me. Like many junior service desk people, I had been given a process to follow and told that I must follow the rules. But what I learned was that you always need to think beyond the process. You need to understand what the process is for, so that you can recognise when not following it is the right thing to do. It’s easy to teach service desk people to follow an exact process; it’s much harder to teach them to understand customer experience, and to use their judgement to identify when it really is better not to follow the process. Nevertheless, this is what must happen if you want a service desk that delights your customers.

I had the exact opposite experience some years ago, when I arrived at a British Airways desk in an airport and told them that I had a non-changeable ticket to London from a different airport on a different date, but that I had just heard that my mother had died and I needed to get home immediately. The woman on the desk simply issued me a new ticket and put me on the next flight home. What impressed me was not just what she did, but that she clearly had the authority to do it.

A focus on customer experience is not just the responsibility of junior service desk people. It needs to be pervasive throughout the IT organization.

Tip 2 – Separate Incidents and Problems

We have been teaching the difference between incidents and problems for many years, but for some reason there still seems to be a lot of confusion. Here’s what you need to internalize:

This means that incident management should be totally focussed on restoring service so that the users can continue to work, with the least possible impact on the business. If you need to investigate why an incident happened, then this should be carried out as a separate problem management activity.

I see many organizations where incidents take a very long time to resolve, because the IT department tries so hard to diagnose and remedy the cause. Often, it would be much better to do whatever is needed to restore the service, and worry about understanding the cause later. For example it might be better to simply replace a failing laptop, and then erase and restore the original to be used as a spare, rather than spending significant time trying to understand what has gone wrong with the software. This depends on the exact circumstances of course; I’m not trying to define your detailed incident model for laptop repairs – although I am saying that you should have one.

I do know that many organizations use problem management only when they need to investigate the cause of major incidents, or of problems that have caused large numbers of incidents. This means that investigating the cause of routine incidents becomes an incident management activity, sometimes causing significant delays in restoration of service. So I’ll repeat the point. If you can get the customer working again then you should do so, regardless of whether you can fix the underlying IT issue.

Tip 3 – Shift Left

The term “shift left” is used to describe an approach to incident management where knowledge and training are used to help lower-skilled people resolve incidents that were previously managed by higher-skilled people. The term is based on a simple diagram showing a typical support hierarchy:



The cost of managing an incident typically increases as we move to the right in this diagram. Self-service is cheapest, followed by the service desk, then level 2 support and finally level 3 (or vendor) support. Shift left is shown by the arrows, indicating that incidents can be managed by a lower cost resource if the knowledge is made available to support this, and staff has been suitably trained.

Running a shift left project can be a real win-win option. Not only does it reduce the cost of delivering IT support, but it also often leads to faster incident resolution, which reduces the business impact of incidents and leads to higher customer satisfaction and lower cost for customers. I have seen shift left run as a major project, consuming lots of resources, and taking a long time to create value, but you can run a shift left project as a fairly low-cost, low-effort initiative:

Tip 4 – Share Knowledge and Information to Help Resolve Incidents

Knowledge management helps you to provide the right knowledge and information, to the people who need it, at the time it will be most valuable to them. You need effective knowledge management to facilitate most things you do, but incident management can really make good use of it. There are two main uses for knowledge in incident management:

In the future, knowledge will also be needed by machine learning and artificial intelligence software so that it can add value to your incident management practice. If this is something you might want to consider, then you need to start collecting and reviewing knowledge now, to ensure there is sufficient high quality knowledge to make the investment worthwhile.

Time spent creating and managing the knowledge and information you make available to your service desk and/or end users can create a huge amount of value compared to the effort invested. But many organizations think of knowledge management as a tool-led exercise. What’s important is to focus on the content rather than on the tools used to manage it. You need to focus on getting your people to help create valuable content by contributing relevant knowledge and information. You also need to ensure your people routinely make use of the valuable content contributed by others.

One way to make knowledge management part of your incident management process is to follow the ideas of Knowledge Centred Service (KCS), but you could also just run a very simple project that:

You should keep going round this loop until the effort required to share the knowledge and information is greater than the value it provides. You may eventually need to invest in knowledge management tools, but you can do a lot with nothing more than a web site or a file share, so long as your people have the right attitudes, behaviour, and culture.

Tip 5 – Make Effective Use of Incident Categories

Every organization uses categories to help manage their incidents, but the categories that I see in use are often poorly designed and don’t deliver much value. In many cases the categories haven’t been reviewed for years, and even if they were well designed when they were put in place, they are no longer fit for purpose.

Before you review your categories, you should think about what you are going to use them for. Typically, this will be one or more of the following:

Once you have thought about how you use incident categories and how you intend to use them in the future, you should review your existing categories to see how well they achieve your intentions. If you decide that they need to be changed, you will need to meet with all the relevant stakeholders to make some key decisions. You might need to decide:

You can find more tips and guidelines for creating/reorganizing your categories in Joe the IT Guy’s blog The Greatest Ever Code to Incident Categorization.

Tip 6 – Make Sure Your Metrics Drive the Behaviours You Want to Encourage

Many of the IT departments that I have worked with create regular customer reports that include large numbers of incident management statistics. A typical report has far more incident management metrics than of all other metrics combined. When I talked to the customers about these reports I often found that they only look at one or two of the metrics, and that they ignore almost all of the data in the report.

I have also seen incident management metrics that encourage service desk agents to behave in ways that neither the customer nor the service provider really wants. For example, if you have a target that agents will close 80% of incidents within 5 minutes, then this may cause them to deliver very poor quality diagnosis and incident resolution, in an attempt to meet the target.

My preferred approach to defining key performance indicators (KPIs) for incident management (and for any other practice) is to start by defining the critical success factors (CSFs) that you need to achieve. Then define a small number of KPIs to support each CSF.

For example, you might have a CSF that says “We resolve incidents quickly, so that they don’t have a significant impact on our customers”. You could support this with KPIs such as “Percentage of incidents closed within SLA times, by priority”, or “Number of priority 1 and 2 incidents that occurred in a 12 month period”. When you create customer reports, you can use the KPIs to show trends, but the discussion with the customer should focus on the CSF – and you should be asking the customer whether it was achieved, not telling them that you delivered the numbers so of course they must be happy.

You can find more suggestions of KPIs that can help you to understand how well you are doing against your goals for incident management in my bog Defining Metrics for Incident Management.

Summary

Incident management can consume many IT resources but it is rarely perceived by customers as something that creates value. Because of this, it is important to constantly improve incident management, to ensure that we deliver the best possible outcome with the lowest possible use of resources.

Customer experience must be the main focus of all incident management activity. There’s no point in providing brilliant technical solutions if we don’t satisfy the customer. The focus on customer experience needs to start with the design of the process and tools, and carry through to recruiting, training, measuring, and rewarding incident management staff, and setting up appropriate relationships with suppliers

Don’t use incident management to resolve problems. Concentrate on helping your customers to get their work done. Diagnosing and resolving technical issues should be left to problem management, and should not normally be needed to resolve incidents.

Enable your staff to resolve incidents at the lowest level possible, by providing the tools, knowledge, and training that will help them to do this. Whenever level 2 or 3 support handles incidents, they should think about how they could empower less expensive staff to manage these same incidents in the future. The cheapest and fastest way to resolve incidents is often self-service, and you should invest in making this work well for your customers so that they want to use it.

Encourage staff to create knowledge and information that will help others to resolve incidents, and to use the knowledge and information that other people have created. Remember that good knowledge management is more about attitudes, behaviour, and culture than about tools.

If you haven’t reviewed your incident categories recently then it may be time you had a look at them. Make sure you know what you want to use them for, and then design a system that will satisfy all your stakeholders with as few categories as possible.

Ensure that metrics and reporting for incident management meet your needs by considering how they influence staff behaviour, and how well they meet the needs of your customers. Reporting should be based on CSFs, and each CSF should be supported by a small number of KPIs.

If you follow these tips, then you should be able to make improvements in how you manage incidents that will help to reduce costs, improve service levels, and increase customer satisfaction – and you really can’t ask for more than that!