Follow us

Look Harder to Find the Real Heroes of Problem Management

By | September 17, 2015 in ITIL

Heroes of problem management

Everyone loves heroes, and not just in comic books. Heroes come in and save the day when things go wrong, and naturally anyone suffering as a result of that ‘going wrong’ is pleased to be saved.

Society employs heroes that are there to save us if a misfortune occurs: firefighters, police, and even the army; and we expect our utility suppliers (water, electricity, gas, telephone, etc.) to have such heroes too, ready to go out in the worst weather to fix any damages. But actually what we, as customers and consumers, would prefer is that things just keep working seamlessly – no water or power failures in the first place – and certainly we don’t ever want to need firefighters rushing to our homes. We like to know that the hero is there, but we don’t want to actually need them and their heroics.

Problem Management

In service management (whether applied to IT or non-IT), the biggest and bravest heroes live inside the problem management team, notably the reactive problem management team, seeking out root causes and solutions to fix what goes wrong.  Just like firefighters, many of these guys enjoy their work the most when faced with challenges and having to fix catastrophic failures or mystery issues that require analysis and skill to solve.

However, if we judge this team only on how well they react to disasters and issues, then we inadvertently set up a mechanism whereby the more things go wrong, the better they look. It’s like judging the fire service on how many major fires they put out. The easiest way for them to look good would be for them to go around starting fires so they can put them out. Now that’s a shuddering thought!

Of course neither the firefighters nor most problem management teams would deliberately cause issues to happen, but without incentivizing them otherwise, they might not focus on trying to prevent disruptions.

For the fire service, prevention rests upon aspects like promoting risk awareness and installing/ maintaining smoke detectors, as well as ensuring human emergency skills and equipment are ready to roll should they be needed.

Reactive and Proactive

ITIL® has always (well, since 1988) ascertained that problem management has two sides: reactive and proactive. Reactive deals with what has actually gone wrong; proactive deals with prevention of impact through means like:

  • Fixing things before they have any impact, such as repairing back-ups where duplication is in place
  • Ensuring things remain healthy by scheduling preventative maintenance
  • Seeking potentially weak or suspect parts of the infrastructure and repairing them before they fail

All this sounds good, but how on earth do you measure it, and justify the money it costs? Within any organization, it’s always hard to justify spending money, particularly when that funding is for making things NOT happen. Whilst it may be the best and preferred situation for everyone, no-one gets fair recognition by pointing out the things that didn’t happen because of their work. For example, the heroes of the Victorian age tend to be the engineers who built the big visible things like bridges and ships, whereas the biggest advancement at that time was perhaps the work done in preventing infectious diseases through the less visible things like water supply and especially the sewers. But we don’t instinctively judge progress by seeing when cholera stopped being widespread.

Corporate management should also want a set of services that don’t break, failures that don’t happen, and business output uninterrupted by errors. But even if management recognizes this, it can be hard to set targets and encourage improvement. And it can be even harder to get funding to expand, or even keep, resources dedicated to this preventative work. The issue lies with the difficulty of quantifying this type of work, but thankfully there are workarounds.

Measuring the Value

So what can be measured that would indicate that the proactive problem management team is delivering what is needed by the business? The most powerful way is to measure the damage being caused and (hopefully) how it’s gradually being reduced month by month through things such as:

  • Service availability. How often does someone not do their job because a service isn’t there for them?
  • Repeat incidents. Even the most trivial interruptions and failures can get in the way of the business. Working on making sure recurring faults go away reduces the damage.
  • Cost of IT. Failures cost money and spending too much money is damaging to the business. Not having failures saves money. It’s that simple.

One of the biggest messages to get across is that spending money on prevention is often money well spent. Most people who rely on their car to get to work spend money to have it serviced regularly and replace components like brake pads before they fail. Funding proactive problem management is just applying that proven logic to the business.

We prefer our car to keep working and don’t want to be in a situation where we admire the mechanic who fixed our car in the rain on the roadside at night. What that preference translates to, in ITSM terminology, is mean time between failures (MTBF) – where the longer the time, the happier the user is likely to be, and it’s a good focus for the supplier to have.

So, here’s what I propose for service mangers today - let’s focus more on recognizing, rewarding, and funding the preventative aspects of what we do, and acknowledge the true heroes of problem management.

Image Credit

Oded Moshe

About Oded Moshe

Oded is VP Products at SysAid, with over 15 years of experience in various product and IT management positions. Proud father of two young (iPhone/iPad-addicted) girls and one baby boy (that they're trying to keep the gadgets out of his reach). Fond of new technologies, and enjoys good conspiracy books and movies.

Leave a Reply

Your email address will not be published.