Follow us

ITSM Basics: A Simple Introduction to Problem Management

By | July 14, 2015 in ITIL


If you regularly read my blog you’ll know that I’ve already written a fair bit on the tough nut to crack that is problem management. It’s often something that’s started as part of the latest IT service management (ITSM) tool implementation project, but it’s not unusual for this initial investment in problem management (processes) to fail in execution due to one or more reasons.

From a problem management uptake perspective, if you believe what the annual industry surveys report, roughlytwo-thirds of IT organizations are already “doing” problem management. But it’s not always what it should be, i.e. the investment of time and resources to proactively investigate and address recurring IT and business issues, and their root causes. It’s this type of investigation that helps to identify the issues that cause (or may ultimately cause) repetitive and potentially serious IT and business issues or failures. Instead, IT organizations are often just doing major incident reviews, using problem management techniques, as and when needed. It’s problem management of sorts but not truly effective problem management.

In reality, problem management is often somewhat of the “poor relative” to service desk and incident management activities. Whereas service desk and incident management are commonly receiving adequate investment in terms of staff, definition, training, and ongoing operation, problem management, on the other hand, is often “something to be done later” and therefore often not done at all.

In my opinion, the low levels of proactive problem management adoption is quite ironic. The pressure to cut IT operational costs is why many IT organizations don’t do problem management, but it should be the reason why they need to be doing problem management. Of all the major ITIL processes, the investment of time and resources in truly effective problem management activity can provide some of the highest returns to an organization.

So to give you a simple introduction to problem management, I’ll quickly cover:

  • What problem management is
  • The objectives of problem management
  • The “problem lifecycle”
  • The benefits of problem management

I refer to ITIL a fair bit, you might think too much, but you can quite easily use your own self-created problem management process and activities or look to alternative sources of ITSM and IT management advice such as ISO/IEC 20000, ISACA’s COBIT, USMBOK, or Microsoft’s MOF.

Learn about SysAid Problem Management

Where Problem Management Fits In

Problems (definition below) can be identified throughout the IT ecosystem. For example: acceptance into production, changes, updates/patches, vendor products, user errors, production execution, and failures. However, the main source for problem identification with an organization is probably the analysis of incidents as part of what is often called the “proactive problem management process.”

However, not only isproblem management often solely associated with major incidents, another barrier to effective problem management is that problems are often confused with incidents (with the terminology interchanged wrongly). Or they are seen as an incident state rather than a separate entity requiring a different type of ITSM response.

If it helps, an easy way to remember the difference between the two is that:

  • Incident management is “put the fire out ASAP!” (so it’s firefighting), whereas
  • Problem management is “how did this happen?” and “how do we stop this happening again?” (so it’s arson investigation/fire prevention).

To succeed at problem management, IT senior management needs to appreciate that far too much costly, and possibly scarce, IT resources are currently spent fighting repetitive fires and that these resources would be better utilized supporting problem management activity to tackle the root causes, rather than the symptoms, of IT failures.

Problem Management Definition

ITIL, the ITSM best-practice framework formally known as the IT Infrastructure Library, uses the term problem to describe:

“The unknown cause of one or more incidents.”

With problem management:

“The process of minimizing the adverse effect on the business of incidents and problems caused by errors in IT infrastructure and systems, and to proactively prevent the occurrence of incidents, problems, and errors.”

A problem will become a “known error” when the root cause is known and a temporary “workaround” or a permanent alternative solution has been identified.

For completeness, although I state my own benefits below, ITIL states that the value of problem management includes:

  • “Higher availability of IT services by reducing the number and duration of incidents that those services may incur. Problem management works together with incident management and change management to ensure that IT service availability and quality are increased. When incidents are resolved, information about the resolution is recorded. Over time, this information is used to speed up the resolution time and identify permanent solutions, reducing the number and resolution time of incidents.
  • Higher productivity of IT staff by reducing unplanned labour caused by incidents and creating the ability to resolve incidents more quickly through recorded known errors and workarounds.
  • Reduced expenditure on workarounds or fixes that do not work.
  • Reduction in cost of effort in fire-fighting or resolving repeat incidents.”

Problem Management Objectives

ITIL defines the objectives of the problem management process as:

  • “Preventing problems and resulting incidents from happening.
  • Eliminating recurring incidents.
  • Minimizing the impact of incidents that cannot be prevented.”

Importantly, it can’t operate in a vacuum.

Problem management should have strong relationships with other key IT service management processes. In addition to the more-obvious linkages with incident and change management, it also needs to use configuration management data to help determine the impact of problems and resolutions. Let’s also not forget that availability management has a dependency on problem management information and activity, and some problems will require investigation by capacity management teams and techniques.

Problem management can also be an entry point into IT service continuity activity and major incident management, where a significant problem needs to be resolved before it starts to have a major adverse impact on the business. Finally, from a service level management perspective, problem management contributes to improvements in service levels, and its management information should be used as the basis for service review activity.

The “Problem Lifecycle”

While not a linear lifecycle like incident management, you can view a problem going on a journey from identification through to “resolution.” Where resolution might come from error control or the creation of a workaround.

Thus it’s worth understanding that there are two common problem management “sub-processes”:

  • Problem control – which focuses on transforming problems into known errors (and workarounds)
  • Error control – which focuses on resolving known errors via the corporate change management process

The result might be one of three outcomes:

  1. That a change is required to correct a problem – the organization should use an “error control” process to correct the problem via the corporate change management process.
  2. A problem cannot be fixed but a workaround is identified; the problem is classified as a known error with a workaround (a temporary way of resolving the incident); it’s logged in a known error database and made available to all support teams for ongoing incident resolution activity.
  3. No fix or workaround is identified. When a problem is investigated but no solution or workaround is identified, it is recorded as a “known problem” — with the information again made available for the benefit of all support teams.

It’s important to recognize that these three problem states are not mutually exclusive and that a problem may move between them over time. For instance, when possible, a workaround should still be made available while a problem is awaiting the implementation of a required change.

My simple diagram hopefully provides a snapshot of what can happen with problem management.

ITIL's problem lifecycle: the flowchart

Problem Management Benefits

In my opinion the key benefits of problem management include, but are not restricted to:

  • Decreasing downtime and thus potentially maximizing business productivity
  • Preventing incidents before they adversely impact business operations
  • Making better use of potentially scarce IT resources
  • Better collaboration between different IT teams in preventing recurring issues; defined roles and responsibilities and a single, consistent process not only speed things up but also reduce duplication of effort and wastage
  • The ability to leverage existing known error and “workaround” knowledge to prevent the proverbial “reinvention of the wheel” and to speed up resolution
  • Reducing the costs associated with both IT service delivery and IT support – best practice processes and automation can both save time and effort, and therefore cost
  • Reducing the adverse effect of business-impacting incidents through prevention or workarounds; this might potentially include lost revenue, lost reputation, or even lost customers
  • Improving customer service and the business’s perceptions of the IT organization as a whole

Well there you have it, a quick guide and a simple introduction to problem management. Hopefully you found it helpful.

If you want to read more from me, and few of my friends, on problem management, then please look at:

Learn about SysAid Problem Management
Joe The IT Guy

About Joe The IT Guy

Native New Yorker. Loves everything IT-related (and hugs). Passionate blogger and Twitter addict. Oh...and resident IT Guy at SysAid Technologies (almost forgot the day job!).

4 thoughts on “ITSM Basics: A Simple Introduction to Problem Management”

  1. Avatar Ian Clayton

    One of the biggest issues with problem management is defining the problem and its impact in terms ‘felt’ and understood by those who are negatively impacted by the situation, either now or in the future. ITIL has never offered any help in this area. There are ample, well proven techniques for doing this, and prompting others to support action and the investment of time to perform ‘root cause analysis (RCA)’. Now RCA is another topic I’ve plenty of comments on – but I’ll wait for another blog 🙂


  2. Avatar Michael Hall

    Useful overview until you got the very old idea of problem control and error control – unnecessary complication. The simplest problem flow is find the problem -> find the cause(s) -> fix the causes.
    I think you could talk more about how to find cause and the different levels of cause (direct, root, cintributing etc.) Maybe that is a topic for another day and another blog …

    Enough nitpicking. I am glad you are spreading the word about the value proposition of problem management, this is the really good part of your post.

    Much appreciated … Michael


  3. Avatar Sanjay

    Simply put together the problem identification is either Reactive or Proactive. To add some more to the diagram presented
    1. After a Problem has been resolved its effectiveness has to be measured in terms of – has the problem solution brought down a considerable number of incidents? Has the change initiated by the Problem brought a significant difference to the infrastructure or to the business? Bottomline how has it been effective for the business?
    2. The problem post its resolution should also make its way into the Solution Database (SDB) which should also form as an input to the Knowledge Management function to develop documentation for the identified Problem and the Known error also closed.
    3. Another improvement which a problem management can bring is improvement of self-help solutions by reducing the number of incident reporting


Leave a Reply

Your email address will not be published.