In part 1 of this blog, I focused on what to look, and ask, for when assessing a change. Here, in part 2, I go further, and beyond the impact assessment, to look at minimizing the potential for adverse IT and business impact from poorly planned or executed changes.
You could argue that the title of my blog is now inappropriate but, in my opinion, it’s good to talk about change impact and the potential risks rather than change management per se. You could argue that this blog also dips its toes into release management too. But, if you have time to argue about blog titles, you probably need to find a more engaging job.
Sadly, it’s not just a case of hitting a big red button and cackling “fly my pretties, fly.” A good plan clearly sets out who does what, where, and at what time. Sounds simple but I’ve lost count of the amount of time I’ve had to spend rounding up engineers that were distracted mid-change or just weren’t aware of what they should be doing… “Oh right, that was today wasn’t it?”
Sometimes it’s like herding cats. So make sure that you’ve got any handover periods highlighted and primary and secondary contact details for everyone involved. Mobile signals can drop and Exchange servers can glitch, so make sure the people involved are contactable at key times.
How do we know the change was successful? “Okay people, we’ve deployed the change and nothing looks hideously wrong. Back away quietly and don’t make any sudden movements.” Does it sound vaguely familiar?
It’s important to run a few tests. To have a few simple tasks to ensure that not only has the change been deployed but that all relevant services are up – with everything as it should be. “Trust but verify” is the name of the game here.
You don’t have to “rock the world” but simple things like making sure that all the shared drives/folders are available on a WinTel server, or that system traffic is running correctly over your network, are simple things that can make the world of difference to your customers and end users, and to ongoing business operations.
With the best will in the world, even if you’ve done everything right before the change, something unexpected could still pop up. Hardware failures, power spikes, and code glitches all can, and do, happen – so you need to have a robust plan in case something goes wrong during the implementation window.
So consider how you’ll deal with something unforeseen. Can you roll back, fix on fail, or do something else? There’s no right answer here, just make sure that if something does go wrong, then you have a plan to restore service – with the appropriate timings built into the change window. Also ensure that the team of people implementing your change are empowered to make decisions and if there are points during the change window that we know pose particular risks, schedule a quick call with involved parties to discuss them.
Which environment are we applying the change to? Don’t give me that look. I know that we’re overrun with changes to our production environments, but other environments such as pre-production or DR (disaster recovery) are important too.
Why should you be worried about DR environments? A friend of mine onceworked for a large investment bank in the city. And during a code change to one of the bank’s most business-critical systems – the market data feed to the trade floors – things were taking a little longer than expected. So, ignoring what was written in the change plan, the decision was made to only update the production environment. It was always the plan for the implementation team to go back to update the DR environment but, as tends to happen in the busy land of IT, they got distracted by other operational priorities and it was never done.
Fast-forward six weeks and a crisis hit the trading floor. The call is made to invoke DR but they couldn’t because the market data services were out of sync. Cue a hugely stressful two hours where the whole IT organization desperately scrambled to find a fix. The business impact here was estimated at over US$8 million – not a mistake that any IT professional wants to be held responsible for. Well, not if they want to stay employed.
By asking the right questions (and using templates or change models to make raising changes less painful), not only do you mitigate adverse business risk, if something does go wrong, but you can support incident management activity in fixing things as quickly as possible. You can also support problem management in identifying or confirming the root cause before raising the appropriate actions to ensure the similar future changes don’t fail. You’re never going to be able to achieve “bulletproof” change management but using the list in the first blog, and covering the points made in this one, it will help to ensure that you are at least consistent – which will go a long way to reducing risk and the potential for any disruptions in business operations.
What are your top tips for change impact assessment?