advertisement
advertisement
The Fast Company Executive Board is a private, fee-based network of influential leaders, experts, executives, and entrepreneurs who share their insights with our audience.

How to mitigate the risk of cloud downtime

There are many benefits to operating in the cloud, but there are also risks and consequences that may hurt revenue and reputation—especially for an unprepared business. 

How to mitigate the risk of cloud downtime
[Shuo / Adobe Stock]

There are many benefits to operating in the cloud, like reducing the cost of owning and running in-house servers and applications. But there are also risks and consequences that could hurt revenue or reputation, especially for an unprepared business.

advertisement
advertisement

Consider the AWS outage of December 7, 2021, which impacted organizations like Netflix, The Associated Press, Delta Air Lines, and Toyota. Such outages usually inspire a deluge of sound and fury from cloud-service users and media, and the affected organizations lose transaction revenue, productivity, and user confidence.

Cloud outages are not uncommon. Data from Parametrix indicates that one of the three major public cloud providers—Amazon, Google, and Microsoft—experienced an outage of at least 30 minutes every three weeks in 2021.

While a half-hour is not a long time, it is an eternity to the companies whose revenues and reputations are based on the availability of cloud applications and services. The fact that companies have no control over restoring third-party cloud platforms, and have no idea how long an outage will last, only makes matters worse. And to users who expect uninterrupted availability, or have an urgent need for a company’s service at the time, it will also feel like an eternity.

advertisement
advertisement

Since the large public cloud vendors which attract the best talent in cloud systems engineering can’t avoid outages, neither can their customers. Still, all organizations can prepare for cloud downtime by mitigating the risk and having downtime procedures in place.

RISKS OF OPERATING IN THE CLOUD

Vendor lock is a big risk of operating in the cloud, as it’s difficult to migrate to another public cloud provider and replicate the functionality of systems invested in a current provider. Loss of control is another issue; the cloud provider owns and operates the hardware and the software applications that run a customer’s business. Public-cloud customers are also dependent on the cloud provider’s future. As the cloud provider navigates its business, so go the customers’ cloud applications and infrastructure.

advertisement

Finally, and most important, is the risk of downtime. When a public cloud provider’s systems go down, so too do customers’ cloud-based systems—and the list of horribles quickly follows.

CONSEQUENCES OF CLOUD DOWNTIME

Perhaps the most obvious consequence of cloud downtime is direct revenue loss. When the cloud goes down, a transaction-based business will have no transactions, and customers may take their business elsewhere. Recovery costs also add up. When the public cloud service is restored, the customer’s IT professionals need to test and ensure their systems have also been restored.

advertisement

Service-level agreement (SLA) liability comes into play when an SLA includes provisions for uptime; if a threshold is breached, a company may need to pay service credits or hard cash to customers. Legal liability is especially an issue for organizations in regulated industries, as they could be sued by the SEC, other entities, or individuals for not being available.

Productivity is lost when employees are unable to work because their cloud-based systems are down. And finally—and perhaps the most important—reputation loss. Customers and users have increasingly low tolerances for downtime, and a company’s reputation suffers along with its customers when its cloud-based services are unavailable.

MITIGATING THE RISKS OF CLOUD DOWNTIME 

advertisement

There are technical, financial, and human resources that can be applied to mitigate the risks of cloud downtime. Ideally, all should be in place.

The potential technical solutions include multi-cloud redundancy—that is, duplicating resources on more than one cloud to provide fail-over to a second cloud platform if the first suffers an outage. Multi-cloud redundancy is expensive and not only requires experts on each cloud platform to ensure everything is communicating and syncing but also requires you to duplicate the resources—including applications, compute, and storage. Theoretically, multi-cloud redundancy is an excellent option, but more often than not, it proves to be overly expensive and complex for all but the most sophisticated organizations.

Many organizations don’t have or don’t want to dedicate the resources required for multi-cloud redundancy. Instead, they build resilience by implementing redundancy across multiple regions within a single public cloud provider. This is much cheaper and easier to manage than multi-cloud redundancy. Cloud regions are usually independent, so this option provides objectively excellent redundancy.

advertisement

Solutions for mitigating the financial risks of cloud downtime primarily include insurance that provides hourly compensation for each hour of downtime a public cloud provider experiences and enable companies to cover any type of losses and expenses incurred during an outage.

On the human side of mitigation efforts, it helps to have a disaster recovery plan, and in-house or consulting DevOps and site reliability engineering (SRE) teams that can quickly locate a problem—whether it lies with the public cloud provider or the customer—and fix the problem if it lies with the customer. Such teams also need to ensure their systems are functioning normally after restoration.

Cloud-based systems are here to stay because they offer tremendous advantages like rapid deployment and scalability and reduced need for in-house support. Although cloud outages and downtime are also here to stay, foresight and preparation will help public cloud consumers and their customers weather the storm when they occur.

advertisement

Neta Rozy, Co-founder and Chief Technology Officer, Parametrix Insurance

advertisement
advertisement