ITSM

The Secret to Ending Known Errors

6 min read

June 18th, 2019

1702 views

Known Errors on a Computer

As a young(er) sysadmin, I once tried to stand in the way of a major system rollout.

The development effort for this new system had taken over nine months of effort, with a project team comprised of over 100 people from across the business. One of the major goals of the initiative was to replace a few legacy systems and deliver a consolidated and standardized solution to the organization. The project was timed so that the new system would go online in a few weeks prior to the leases and maintenance contracts coming due on the legacy systems. Multiple millions of dollars had been invested in the development of the new system, with additional monies at risk if the new system didn’t go online in time.

But I was aware of multiple bugs in the new system, ranging from minor to significant. And I raised the issue and my concerns to project leadership. I questioned whether we should go live, knowing we had bugs, or take a few more weeks to resolve these issues.

And I got ran over. The new system went live, on time, as planned.

Back in the day, new development projects usually took a long, long time to get done. Invariably, there was a lot of scrambling at the end of the project to get those last features in, and those last unit tests and end-user acceptance tests done. There usually were some “last minute” errors or bugs discovered; sometimes, there were issues that were discovered early in development. But project plans and times never seemed to have enough time or budget built in to appropriately fix those bugs and known errors before going live.

But I’ve seen the future – and it doesn’t include the concept of ’known errors.’

What is a known error?

ITIL defines a known error as a problem that has been analyzed, but not resolved. In other words, we know what’s wrong, but we haven’t fixed it.

There are many reasons why an IT environment would contain known errors. Known errors may have been introduced as part of the installation of new hardware, or the implementation of commercial software. There are likely errors (some known) in operating systems and tools.

Most commercial vendors do a good job of documenting and publishing their known errors as new versions of their products are distributed to customers. As customers, we’ve become so accustomed to this practice that we hardly think anything of it, much less push back and insist that these issues get fixed.

But there is another way.

The answer is continuous integration

Continuous integration (CI) is a concept that has been popularized by DevOps. CI is a development practice that requires that all developed code be merged, on a daily basis at a minimum, into a shared repository. Code development is typically done in small units of work, and a developer cannot keep code checked-out any longer than a day. To ensure that code always remains in a deployable state, testing (typically automated) is always conducted before any merge of code.

Sounds straightforward enough, doesn’t it? So what are the implications of adopting CI?

Because coding is done in small units of work, testing and validation become simpler. The quality of code should increase because effort can shift from “does it work?” to “does it meet the demand and deliver the needed results?”

But the biggest implication? If CI is done correctly, there should no longer be any possibility of a known error coming from development efforts. None.

The impact of no more known errors

Can you imagine what a world of no known errors would do for an organization’s ITSM environment? Here’s how I see it:

Problem management would shift from a reactive activity to a proactive activity. Rather than spending time, resources, and human effort fixing something that perhaps shouldn’t have happened to begin with, problem management would proactively look and resolve any issues in the managed environment.
Incident management would be dealing with fewer incidents. This means that consumers would be able to spend more time doing their jobs, because services would be highly available. Front-line support would be able to do other value-added work, such as develop knowledge articles.
Change management would become more proactive by helping define test and validation success criteria prior to development activities, along with clearly defined approval guidance. As a result, change management would become an enabler, not a constraint, in getting code into the productive environment.

Think about it – CI is simply doing the things right, the first time.

Think about what CI would do for many current ITSM implementations where people are spending too much time responding to outages, or sitting in CAB meetings. For many organizations, CI would allow ITSM to evolve from something primarily done by IT Operations to something more holistic – whereby all of IT can work together towards the same goals.

Here @DougTedder explains the benefits and some tips for adopting a Continuous Integration attitude. #Business #DevOps #CI Share on X

Does CI (really) require an adoption of DevOps?

While CI is typically associated with DevOps, I don’t think that an organization has to adopt DevOps to get the benefits of CI. I think CI has to be an attitude, not necessarily related to a methodology or approach.

CI means that kicking an error or defect down the road is no longer acceptable within an organization.

CI means that testing becomes automated, which means that testing becomes consistent and repeatable.

CI means doing work in smaller increments, so that in the event of a failed test, it is easy to determine “what failed.”

Does it really matter then if you’re following an Agile approach or a waterfall approach? While CI is typically associated with an Agile/Scrum methodology, who says that you can’t do CI as part of a waterfall project? What it means – in a waterfall approach in which development is done in a linear fashion from concept through deployment – is that integration and testing is happening throughout the project, not just during a late phase of the project.

4 tips for adopting a CI attitude

Ready to adopt a CI attitude? Here are some tips for doing so:

Training and Awareness – A CI attitude means accountability, communication, collaboration, and participation. If that isn’t how you’d describe your environment today, then training and awareness is a critical first step for changing the culture of the organization.
Invest in test environments – Having production-like test environments enables CI to work most effectively.
Define and automate test routines – Test criteria and validation scripts must be defined, agreed, and documented, so that those actions can be automated.
Break down work into smallest possible chunks – Not only does this approach make testing simpler, but it also adds velocity and higher quality to the work that is being done.

Will known errors ever become a thing of the past? Perhaps not until organizations expect and hold software and hardware suppliers accountable for having a CI attitude. But this shouldn’t stop organizations from realizing the benefits of a CI attitude internally – it is the way of the future!

Have you tried using continuous integration? What are you doing to eliminate known errors? Please let me know in the comments below.

About

the Author

Doug Tedder

Doug is an ITSM and process improvement consultant, trainer, and accidental social media savant, enabling IT organizations to transform, sustain, and grow real business value. An active volunteer in the ITSM community, Doug is a frequent speaker and contributor to industry user group meetings, webinars, blogs, and national conventions.

What is a known error?

The answer is continuous integration

The impact of no more known errors

Does CI (really) require an adoption of DevOps?

4 tips for adopting a CI attitude

You'll Love This Too!

Preparing IT Staff for AI Use

Considering Generative AI for ITSM? Here’s What You Need to Know

2024 ITSM Trends – “Do Existing Things Better”

If AI is the Future, That Future is Already Here for ITSM

SysAid On-Prem Software CVE-2023-47246 Vulnerability

Measuring Success in IT

About

the Author

Doug Tedder