Don’t be fooled – by the cloud tech-talk of instances, databases, code, and APIs – into thinking that cloud is all about technology. Unfortunately, this kind of tech-talk can convince IT service management (ITSM) professionals into thinking that cloud is just another technology evolution with zero impact on ITSM. However, those that realize cloud is all about business, applications, service, and operations will understand the impact across the ITIL ITSM best practice framework – in particular, the impact on capacity management.
Cloud requires a different capacity management process to that used for traditional, on-premise IT services. The key changes are in the following five approaches, all of which make capacity management more granular, and move from the long-range “vague” to the short-range “specific”:
An ITSM professional that understands these five cloud capacity management approaches will be a huge asset to any organization, measured in terms of the business bottom line as well as service quality.
Buying the full IT stack for on-premise IT service delivery is a long, difficult, complex, and expensive process. Want to know how long?
It takes months, maybe nine-to-twelve months is standard, to design, procure, and deploy any reasonably complex system on-premise. Once procured, it has a lifetime of three, five, or seven years. Maybe longer. This is the long, long, length of the on-premise capacity management horizon.
Over that time, capacity is over-provisioned for peak workloads and this over-provisioning burns money. One might as well be throwing dollar bills out of the window. But in the traditional IT operations spirit of “I only get fired for outages,” capacity management thinking prefers to avoid under-provisioning that can hurt customers and therefore the business.
A capacity manager doesn’t have this many-year-horizon with cloud services. The capacity manager now only needs to forecast ahead as far as the time it takes to add more capacity to the cloud services and that is, on average, around 15 minutes from decision to deploy, including the time to make a coffee and get comfy in front of the console.
Cloud capacity managers additionally do longer predictions to save money by purchasing reserved cloud capacity, sometimes saving over 60% in costs. So capacity management still has a role to play in longer-forecast planning but it’s now about financial efficiency, not the avoidance of disaster.
As if predicting capacity changes wasn’t hard enough, responding to them is difficult in non-cloud systems.
Capacity managers cannot quickly respond to unplanned changes in demand if it takes months to procure and deploy capacity on-premise. The brand is then damaged and customers leave if the IT service is down or the business can’t adequately process transactions during highly-visible seasonal fluctuations such as summertime or Christmas (when, unfortunately, many staff are off work).
Cloud components can be scaled quickly and even large amounts can be done in a few hours (10,000 VMs anyone?) with some extra communication with the cloud service provider. Plus, the business can scale down quickly too and turn off all of that excess capacity when the seasonal fluctuation subsides. This can’t be done on-premise, it can only be done in the cloud.
On-premise systems might be measured by the number and size of datacenters, comms rooms, and racks. Adding a server might mean adding another rack. That might mean adding another switch, and another rack. Which then might mean extending the closet or room, or even the datacenter.
To avoid hitting these capacity potholes, long-range capacity management forecasting is done to provide more capacity well ahead of the predicted demand. This is a standard enterprise “best practice” approach that’s wasteful and expensive.
In the cloud, it’s possible to keep on adding VMs without worrying about any physical infrastructure or other capacity limits – and so now the granularity of capacity is one virtual machine.
If it’s possible to use higher-order cloud services such as AWS S3 storage, then operations are further removed from storage capacity considerations as these are so scalable a normal enterprise will never hit the limits – and no capacity management is required in the traditional sense. Capacity management now moves to the question “How efficient are we being with our used capacity, can we save money?”
Responding to expected and unexpected demand causes much stress for a capacity manager. For instance, in a typical fixed-size, on-premise IT system there are physical limits to the processing capacity.
The normal behavior when capacity demand exceeds current supply is to push out or de-prioritize non-production workloads – something has to give. But what if getting the new product live is also business critical, and that’s what the non-production workloads are doing? Is the unplanned production capacity demand now delaying an important product release, promised to customers already through advertising and other communications?
In the cloud, this is handled differently. Capacity managers can use automated systems such as AWS EC2 Auto Scaling to manually, by schedule, or dynamically add capacity, such as more compute or more load balancers. The only upper limit to capacity supply is how much the business can afford to spend.
Pay-as-you-go (PAYG) is one of the five essential cloud characteristics. This consumption-focused purchasing method means that you can align operational expenditure to business need via only consuming the cloud services you need. The alternative approach with on-premise is purchasing hardware and software, and owning (and managing) these assets for a three, five, or seven-year period.
Some organizations have budget arrangements to annually plan spend against capital expenditure. This can also be done with the cloud with mix-and-match reserved capacity (annual) and PAYG (on demand). This allows capacity managers to cater for mostly-steady but occasionally-“bursty” workloads.
The other demonstration of mixing OpEx with CapEx is in the so-called Hybrid Cloud model – mix the CapEx-laden on-premise systems with OpEx-savvy public cloud – handling the steady-state workloads on-premise; and the fluctuations in the public cloud. If you can achieve this technically, architecturally, and operationally that is.
Capacity management is still important, but different, when it comes to cloud. The old constraints are different and a modern capacity manager is now constrained only by budget (and its efficient use) and a workload’s ability to exploit cloud architecture for auto scaling.