Nearly 30 years ago, ITIL® launched itself on an unsuspecting world with five books published. One of those was called Help Desk, and another one dealt separately with Problem Management. The separation of two requirements – dealing with calls and getting people working again – was one of the key elements of the fledgling ITIL guidance. The help desk, via the incident management process, was meant to deal with the immediate effects and get people working again. Problem management, on the other hand, was all about finding out what went wrong and preventing it from happening again.
Like with all the ITIL processes that are considered part of service management, the basic principles are ones we are used to in our everyday domestic lives. For example, let’s think about cooking one of your regular recipes – something simple like pasta bake. You get it prepared and ready for the oven, and then can’t find the baking dish you normally use. What do you do? Dinner has to be ready on time so you use a different dish, a bit bigger than your usual one. The bigger surface area will affect the outcome of the dish a bit – topping will be spread out more, thinner depth causes quicker cooking, changing the texture slightly perhaps? So… maybe not perfectly suited but it will do the job and have dinner ready on time, which is your priority for now – incident solved. Later on, you go in search of the missing dish, or perhaps even buy a new one to replace it – problem resolved.
However, this simplistic approach – solve incident now and problem later is not always best. We need to consider some other aspects too:
So, the feedback from the incident and its resolution is important input to problem management. If the improvised solution actually delivers as good or better a result, problem management folks need to know so that they don’t spend time working on a solution to something that now doesn’t need fixing. Otherwise, they could well be successful in restoring the original situation when what you have now is actually better!
We can therefore see that ongoing communication between incident and problem management is also an important element, because incident management actions and observations can be a major influence on problem management in several ways:
To find the underlying cause, the problem management investigation might need to check, or seek additional, information. In our cooking example we might need to ask what else was happening when the incident occurred. Was the dishwasher running for example? Have storage locations been changed recently? Has a child just gone off to university and maybe taken some kitchenware along to use in the dormitory?
In the more complicated IT service management (ITSM) world, ongoing communication is vital to help understand causes. Incident teams might be asked by the problem team to be on the look-out for certain incidents, perhaps changing the normal priority settings to help detect trends.
While the roles are different between the incident and problem teams, it doesn’t mean the people have to be. In fact, it might be worthwhile having some overlap. Someone who’s aware of the incidents, and how they’re dealt with, automatically has a lot of information that will help with problem detection. In small organizations this is inevitable (and probably a good thing); in larger ones separating the roles can lead to duplication of work (not very efficient, but sadly it’s true). Using the incident team as a communication channel to those affected and having them maintain and actively use that channel requires collaboration but it is effort well spent.
When problem management needs investigation, knowing when to stop can be a difficult judgement call. In IT terms labelling a cause as a ‘network error’ might look like a solution but actually doesn’t necessarily help with preventing the incident from happening again. Probing deeper might mean handing that investigation over to a more specialized team. Making sure that the processes and procedures encourage this can really help in finding the actual underlying cause of things. Not doing so makes it look like problems are solved quickly, but means they tend to recur more often.
Going back to our kitchen example – let’s say we found the missing dish broken in pieces in the garbage bin. This could explain why it wasn’t available, but might not help prevent the replacement dish from breaking too. Further investigation to find that your children used the dish in play might keep the next one intact. You could tell them not to play with it anymore, or you might feel the need to move dish storage to a higher cupboard to prevent recurrence.
So, along with knowing the difference between problem and incident management, do you now also see the benefits of having them together?