Constantly improving incident management will ensure you can deliver the best possible outcome with the lowest possible use of resources.
by Stuart Rance
Incident management is often the first IT service management (ITSM) practice that an IT organization adopts, and many of my clients have a well-established and mature incident management practice. This doesn’t mean that there’s no opportunity to improve though, as there are always things that could be done better, and opportunities to learn from experience. The best IITSM organizations are the ones that recognize that improvement never finishes.
Incident management can be costly: the service desk may employ many people and use expensive telephony and ITSM tools; second and third line support often include experienced (and expensive) staff; and vendor support contracts can be expensive. It can also be very difficult to demonstrate the value that the business gets from money spent on incident management. Even if incident management is perfect, every incident will still be seen by your customers as a failure on the part of IT. Achieving agreed targets for service restoration may limit the damage done by IT failures, but your customers would rather the IT hadn’t let them down in the first place. This combination of high cost and low perceived value provides some real challenges to the incident manager, and makes incident management an important area of focus for improvement efforts.
It’s important to identify the purpose of a practice such as incident management, as this can help to influence how you design and operate the process, configure and manage the tools, train and manage the people, and interact with suppliers.
ITIL® (the most widely used framework for ITSM) says that the purpose of incident management is to “minimize the negative impact of incidents by restoring normal service operation as quickly as possible.” This is true as far as it goes, but I think that taking some other things into account as well will help you to do a better job.
Here are a few ideas about what incident management is for, which I have found useful. You may want to adopt some of them, and maybe you can add some more of your own.
To resolve incidents quickly, so they have as small an impact on your customers as possible
To prioritize incidents appropriately, in order to address the ones that are most important to your customers first
To communicate well, so that your customers understand what you are doing for them and when their incidents are likely to be resolved
To recognize repeat incidents (that have already happened multiple times), or incidents that you think might repeat in the future, and log problems so that the number and impact of future incidents can be reduced
To make efficient use of both customer resources and service provider resources
Once you have decided what your incident management process needs to focus on, you can review what you do, and how you measure and report it, so that you can make improvements. If you are not clear about what you are trying to achieve then it can be very hard to identify worthwhile improvements. Here are some tips to help you improve your incident management:
When I was just 16, I got a job in a call centre, taking phone calls from people whose gas appliances wouldn’t work. I was given strict instructions to ask each customer for their name, address, and phone number – which I entered into a form on a computer screen – and then tell the customer that a fitter would visit them within four days. It was important for me to handle each call quickly as there were many calls to take each day. On my first morning I took a call from a very angry customer who said that his factory boiler had blown up, the foreman was in the hospital, the factory was unusable and all the workers had been sent home. I took his details and then told him, as I had been taught, “a fitter will call within four days”. Shortly after the call, my manager came to see me and I got told off for doing exactly what I had been instructed!
When I look back on this story, I am shocked that I could have been so unaware not just of the business context, but also of the human and emotional significance of that phone call from the customer. This was a formative experience for me. Like many junior service desk people, I had been given a process to follow and told that I must follow the rules. But what I learned was that you always need to think beyond the process. You need to understand what the process is for, so that you can recognise when not following it is the right thing to do. It’s easy to teach service desk people to follow an exact process; it’s much harder to teach them to understand customer experience, and to use their judgement to identify when it really is better not to follow the process. Nevertheless, this is what must happen if you want a service desk that delights your customers.
I had the exact opposite experience some years ago, when I arrived at a British Airways desk in an airport and told them that I had a non-changeable ticket to London from a different airport on a different date, but that I had just heard that my mother had died and I needed to get home immediately. The woman on the desk simply issued me a new ticket and put me on the next flight home. What impressed me was not just what she did, but that she clearly had the authority to do it.
A focus on customer experience is not just the responsibility of junior service desk people. It needs to be pervasive throughout the IT organization.
When you design your incident management process, you need to consider every point where there is a customer interaction to ensure that you deliver the best possible customer experience. Ask questions like: Can each customer use their preferred channel to communicate with IT? Do the self-service forms make it easy for the customer to provide the information you need? Does the process give the customer timely updates so that they feel reassured about the status of their incident? And so on…
When you recruit and train service desk staff, you must ensure that they have the empathy that is essential for communicating with customers, that they understand the business impact of each incident and how it makes the customer feel, and that they have been empowered to make decisions that go beyond simply following the process when that is the right thing to do. Some of my clients send every service desk person out to work with their customers, either for a few days during their induction or as a regular ½ day activity. This can really help to ensure they understand how IT services are used by the organization.
When you monitor, measure, and reward staff, it’s important to maintain the focus on customer experience – to foster a culture where everyone thinks of customers and their experience all the time.
We have been teaching the difference between incidents and problems for many years, but for some reason there still seems to be a lot of confusion. Here’s what you need to internalize:
An incident is an unplanned interruption to an IT service, or reduction in the quality of an IT service. The purpose of incident management is (as noted above) to restore normal service operation.
A problem is a cause, or potential cause, of one or more incidents. The purpose of problem management is to reduce the likelihood and impact of incidents by identifying actual and potential causes of incidents, and managing workarounds and known errors.
This means that incident management should be totally focussed on restoring service so that the users can continue to work, with the least possible impact on the business. If you need to investigate why an incident happened, then this should be carried out as a separate problem management activity.
I see many organizations where incidents take a very long time to resolve, because the IT department tries so hard to diagnose and remedy the cause. Often, it would be much better to do whatever is needed to restore the service, and worry about understanding the cause later. For example it might be better to simply replace a failing laptop, and then erase and restore the original to be used as a spare, rather than spending significant time trying to understand what has gone wrong with the software. This depends on the exact circumstances of course; I’m not trying to define your detailed incident model for laptop repairs – although I am saying that you should have one.
I do know that many organizations use problem management only when they need to investigate the cause of major incidents, or of problems that have caused large numbers of incidents. This means that investigating the cause of routine incidents becomes an incident management activity, sometimes causing significant delays in restoration of service. So I’ll repeat the point. If you can get the customer working again then you should do so, regardless of whether you can fix the underlying IT issue.
The term “shift left” is used to
describe an approach to incident management where knowledge and
training are used to help lower-skilled people resolve incidents that
were previously managed by higher-skilled people. The term is based
on a simple diagram showing a typical support hierarchy:
The cost of managing an incident typically increases as we move to the right in this diagram. Self-service is cheapest, followed by the service desk, then level 2 support and finally level 3 (or vendor) support. Shift left is shown by the arrows, indicating that incidents can be managed by a lower cost resource if the knowledge is made available to support this, and staff has been suitably trained.
Running a shift left project can be a real win-win option. Not only does it reduce the cost of delivering IT support, but it also often leads to faster incident resolution, which reduces the business impact of incidents and leads to higher customer satisfaction and lower cost for customers. I have seen shift left run as a major project, consuming lots of resources, and taking a long time to create value, but you can run a shift left project as a fairly low-cost, low-effort initiative:
Start by identifying incidents that are using significant resources; typically this will be because they are fairly frequent and always require escalation.
Go to the group that usually resolves these incidents and ask them what would be needed for them to be resolved by the group to their left. This may be access to specific knowledge, or training in how to use a particular tool, or simply practice in executing a procedure.
Get the group currently managing these incidents to put in place the resources required and to hand over responsibility for these incidents to the new group, and then to manage the handover.
Monitor the impact of the changes, to ensure that you really are creating additional value and that customers are happy with the change.
Keep iterating until there are no more incidents for which this is a suitable approach.
Knowledge management helps you to provide the right knowledge and information, to the people who need it, at the time it will be most valuable to them. You need effective knowledge management to facilitate most things you do, but incident management can really make good use of it. There are two main uses for knowledge in incident management:
Giving the service desk agents the knowledge and information they need to resolve incidents quickly and efficiently
Providing knowledge and information directly to end users via a self-service portal, so that they can resolve their own incidents when appropriate
In the future, knowledge will also be needed by machine learning and artificial intelligence software so that it can add value to your incident management practice. If this is something you might want to consider, then you need to start collecting and reviewing knowledge now, to ensure there is sufficient high quality knowledge to make the investment worthwhile.
Time spent creating and managing the knowledge and information you make available to your service desk and/or end users can create a huge amount of value compared to the effort invested. But many organizations think of knowledge management as a tool-led exercise. What’s important is to focus on the content rather than on the tools used to manage it. You need to focus on getting your people to help create valuable content by contributing relevant knowledge and information. You also need to ensure your people routinely make use of the valuable content contributed by others.
One way to make knowledge management part of your incident management process is to follow the ideas of Knowledge Centred Service (KCS), but you could also just run a very simple project that:
Gets people to think about what knowledge and information they need to improve how they work
Establishes where to find that knowledge and information
Provides that knowledge and information to the people who need it
Ensures that the newly created knowledge and information will be used when appropriate
You should keep going round this loop until the effort required to share the knowledge and information is greater than the value it provides. You may eventually need to invest in knowledge management tools, but you can do a lot with nothing more than a web site or a file share, so long as your people have the right attitudes, behaviour, and culture.
Every organization uses categories to help manage their incidents, but the categories that I see in use are often poorly designed and don’t deliver much value. In many cases the categories haven’t been reviewed for years, and even if they were well designed when they were put in place, they are no longer fit for purpose.
Before you review your categories, you should think about what you are going to use them for. Typically, this will be one or more of the following:
To help the service desk resolve incidents associated with a particular category by providing a link to a script or set of questions for gathering all the information that is needed
To help route incidents to the correct group for resolution
To identify the correct SLA targets for such incidents
To assist in trend analysis and problem identification
To help create reports that can be used for continual improvement and customer communication
Once you have thought about how you use incident categories and how you intend to use them in the future, you should review your existing categories to see how well they achieve your intentions. If you decide that they need to be changed, you will need to meet with all the relevant stakeholders to make some key decisions. You might need to decide:
How many different categories you want. In my experience organizations often have too many categories, it’s best to start with a small number and only add more if there is a very strong justification.
What categories each stakeholder needs to do their job.
How many different types of category your stakeholders need in order to support the functionality they require. For example, you may want to record the impacted service, plus a two- or three-level category code (such as: End User Workstation / Hardware / Monitor), plus a closure code (such as: Training Required or Software Bug).
Many of the IT departments that I have worked with create regular customer reports that include large numbers of incident management statistics. A typical report has far more incident management metrics than of all other metrics combined. When I talked to the customers about these reports I often found that they only look at one or two of the metrics, and that they ignore almost all of the data in the report.
I have also seen incident management metrics that encourage service desk agents to behave in ways that neither the customer nor the service provider really wants. For example, if you have a target that agents will close 80% of incidents within 5 minutes, then this may cause them to deliver very poor quality diagnosis and incident resolution, in an attempt to meet the target.
My preferred approach to defining key performance indicators (KPIs) for incident management (and for any other practice) is to start by defining the critical success factors (CSFs) that you need to achieve. Then define a small number of KPIs to support each CSF.
For example, you might have a CSF that says “We resolve incidents quickly, so that they don’t have a significant impact on our customers”. You could support this with KPIs such as “Percentage of incidents closed within SLA times, by priority”, or “Number of priority 1 and 2 incidents that occurred in a 12 month period”. When you create customer reports, you can use the KPIs to show trends, but the discussion with the customer should focus on the CSF – and you should be asking the customer whether it was achieved, not telling them that you delivered the numbers so of course they must be happy.
You can find more suggestions of KPIs that can help you to understand how well you are doing against your goals for incident management in my bog Defining Metrics for Incident Management.
Incident management can consume many IT resources but it is rarely perceived by customers as something that creates value. Because of this, it is important to constantly improve incident management, to ensure that we deliver the best possible outcome with the lowest possible use of resources.
Tip 1 – Focus on customer experience and business value, rather than process
Customer experience must be the main focus of all incident management activity. There’s no point in providing brilliant technical solutions if we don’t satisfy the customer. The focus on customer experience needs to start with the design of the process and tools, and carry through to recruiting, training, measuring, and rewarding incident management staff, and setting up appropriate relationships with suppliers
Tip 2 – Separate incidents and problems
Don’t use incident management to resolve problems. Concentrate on helping your customers to get their work done. Diagnosing and resolving technical issues should be left to problem management, and should not normally be needed to resolve incidents.
Tip 3 – Shift left
Enable your staff to resolve incidents at the lowest level possible, by providing the tools, knowledge, and training that will help them to do this. Whenever level 2 or 3 support handles incidents, they should think about how they could empower less expensive staff to manage these same incidents in the future. The cheapest and fastest way to resolve incidents is often self-service, and you should invest in making this work well for your customers so that they want to use it.
Tip 4 – Share knowledge and information to help resolve incidents
Encourage staff to create knowledge and information that will help others to resolve incidents, and to use the knowledge and information that other people have created. Remember that good knowledge management is more about attitudes, behaviour, and culture than about tools.
Tip 5 – Make effective use of incident categories
If you haven’t reviewed your incident categories recently then it may be time you had a look at them. Make sure you know what you want to use them for, and then design a system that will satisfy all your stakeholders with as few categories as possible.
Tip 6 – Make sure your metrics drive the behaviours you want to encourage
Ensure that metrics and reporting for incident management meet your needs by considering how they influence staff behaviour, and how well they meet the needs of your customers. Reporting should be based on CSFs, and each CSF should be supported by a small number of KPIs.
If you follow these tips, then you should be able to make improvements in how you manage incidents that will help to reduce costs, improve service levels, and increase customer satisfaction – and you really can’t ask for more than that!