As a young(er) sysadmin, I once tried to stand in the way of a major system rollout.
The development effort for this new system had taken over nine months of effort, with a project team comprised of over 100 people from across the business. One of the major goals of the initiative was to replace a few legacy systems and deliver a consolidated and standardized solution to the organization. The project was timed so that the new system would go online in a few weeks prior to the leases and maintenance contracts coming due on the legacy systems. Multiple millions of dollars had been invested in the development of the new system, with additional monies at risk if the new system didn’t go online in time.
But I was aware of multiple bugs in the new system, ranging from minor to significant. And I raised the issue and my concerns to project leadership. I questioned whether we should go live, knowing we had bugs, or take a few more weeks to resolve these issues.
And I got ran over. The new system went live, on time, as planned.
Back in the day, new development projects usually took a long, long time to get done. Invariably, there was a lot of scrambling at the end of the project to get those last features in, and those last unit tests and end-user acceptance tests done. There usually were some “last minute” errors or bugs discovered; sometimes, there were issues that were discovered early in development. But project plans and times never seemed to have enough time or budget built in to appropriately fix those bugs and known errors before going live.
But I’ve seen the future – and it doesn’t include the concept of ’known errors.’
ITIL defines a known error as a problem that has been analyzed, but not resolved. In other words, we know what’s wrong, but we haven’t fixed it.
There are many reasons why an IT environment would contain known errors. Known errors may have been introduced as part of the installation of new hardware, or the implementation of commercial software. There are likely errors (some known) in operating systems and tools.
Most commercial vendors do a good job of documenting and publishing their known errors as new versions of their products are distributed to customers. As customers, we’ve become so accustomed to this practice that we hardly think anything of it, much less push back and insist that these issues get fixed.
But there is another way.
Continuous integration (CI) is a concept that has been popularized by DevOps. CI is a development practice that requires that all developed code be merged, on a daily basis at a minimum, into a shared repository. Code development is typically done in small units of work, and a developer cannot keep code checked-out any longer than a day. To ensure that code always remains in a deployable state, testing (typically automated) is always conducted before any merge of code.
Sounds straightforward enough, doesn’t it? So what are the implications of adopting CI?
Because coding is done in small units of work, testing and validation become simpler. The quality of code should increase because effort can shift from “does it work?” to “does it meet the demand and deliver the needed results?”
But the biggest implication? If CI is done correctly, there should no longer be any possibility of a known error coming from development efforts. None.
Can you imagine what a world of no known errors would do for an organization’s ITSM environment? Here’s how I see it:
Think about it – CI is simply doing the things right, the first time.
Think about what CI would do for many current ITSM implementations where people are spending too much time responding to outages, or sitting in CAB meetings. For many organizations, CI would allow ITSM to evolve from something primarily done by IT Operations to something more holistic - whereby all of IT can work together towards the same goals.
While CI is typically associated with DevOps, I don’t think that an organization has to adopt DevOps to get the benefits of CI. I think CI has to be an attitude, not necessarily related to a methodology or approach.
CI means that kicking an error or defect down the road is no longer acceptable within an organization.
CI means that testing becomes automated, which means that testing becomes consistent and repeatable.
CI means doing work in smaller increments, so that in the event of a failed test, it is easy to determine “what failed.”
Does it really matter then if you’re following an Agile approach or a waterfall approach? While CI is typically associated with an Agile/Scrum methodology, who says that you can’t do CI as part of a waterfall project? What it means – in a waterfall approach in which development is done in a linear fashion from concept through deployment – is that integration and testing is happening throughout the project, not just during a late phase of the project.
Ready to adopt a CI attitude? Here are some tips for doing so:
Will known errors ever become a thing of the past? Perhaps not until organizations expect and hold software and hardware suppliers accountable for having a CI attitude. But this shouldn’t stop organizations from realizing the benefits of a CI attitude internally – it is the way of the future!
Have you tried using continuous integration? What are you doing to eliminate known errors? Please let me know in the comments below.