Coronavirus - Cloud Availability and Cost Management -

Coronavirus – Cloud Availability and Cost Management

Reading Time: 3 minutes

With the brick and mortar world shutting down due to Covid-19, our lives have literally moved online. With many applications and services running in the cloud, there is a massive strain not just on internet bandwidth, but also on cloud compute resources. Evidence of this can be seen by the EU asking Netflix and YouTube to stop streaming HD content, Microsoft reporting an 800% increase in cloud consumption, and Zoom (which runs in part on AWS) seeing their daily users more than quadruple. 

Even under normal circumstances, ensuring high availability while driving cloud-cost efficiency, can be challenging. With Covid-19, this has become much more complex and for many companies, a mission-critical task, as more traffic and usage doesn’t always translate into more revenue. 

During this period of time, understanding how to best leverage cloud compute capacity and working with solutions that automate and optimize workload provisioning, is essential. 

Challenges with managing lower-cost reserved capacity and spot instances during Coronavirus

Typically, many companies looking to reduce costs will turn to long-term commitment, compute-pricing options such as reserved instances and savings plans. However, with Covid-19’s impact still unknown, financial lock-in for 1 or 3 years can be a risky proposition without comprehensive management.

Leveraging cloud providers’ highly affordable, excess capacity, such as AWS’s spot instances, is attractive with zero lock-in, but the current, sustained spike in cloud consumption has made it extremely difficult to provision and hold onto spot instances. In fact, we have seen many companies, some who have tried the DIY (do-it-yourself) approach of leveraging spot instances, come to us asking for help. 

Let’s take a closer look at the spot instance market in today’s climate and how you can successfully run your workloads, from single, stateful instances, to cloud-native clusters made up of thousands of nodes, affordably and reliably. 

Unprecedented cloud market congestion

Starting in February 2020, which correlates to marked escalation of the world’s reaction to Covid-19, we have seen the highest levels ever recorded of congestion across all AWS markets. This surge in online activity has even surpassed holiday periods that usually peak between October and December. Not only are more resources going up on our platform, we see many AWS customers moving to spot instance markets as they look to increase savings from their infrastructure during this economically uncertain time.  With more people at home, ordering, working and living life online, cloud computing resources are at a premium.

spot instance interruptions in a single region
Spot instance replacement levels from April 2019 – April 2020. While this graph shows data from a single AWS region, it is representative of the trend we see in other AWS regions.

Peeking under the hood – how maintains high availability while keeping costs low

With years of experience maintaining high availability for mission-critical applications during Black Friday, Cyber Monday and other cyclical spikes, at Spot we are well-prepared for keeping cloud costs within budget all while handling extreme cloud market volatility. 

At a very general and high level, here is how we do it:

  • More data = better decisions. Continuous, long-term monitoring of ALL spot instance markets, provides a broader data-set than just looking at the current “point-in-time” status of a particular spot instance market. This enables us to choose the most viable spot instances with the greatest potential longevity. Additionally, our ability to match your resource requirements (e.g. vCPU) to the broadest selection of instance families, combined with the above-mentioned view of all spot instance markets’ health, enables us to make the best decisions for workload placement.
  • Proactive spot instance replacements. In the event of a potential spot instance interruption, we proactively replace the affected instance(s) from the congested markets well in advance, allowing for graceful draining and seamless continuation of the workloads in a less congested spot instance market. 
  • Optimized usage of spot instances, on-demand instances and reserved capacity. In scenarios where there are no available spot instances, your workload will automatically fall back to on-demand instances. Once relevant spot instances become available, your workload is returned to those spot instances for maximum savings. In the event that you have unused reserved instances or savings plans, workloads will be prioritized to run on them, but will revert to spot instances whenever that will help maximize overall cost-efficiency.

Of course, there is much more to it, and for a fuller understanding of how Spot can help you during Covid-19 (and after as well), please see the relevant resources for the following uses cases:

Containerized workloads

Reserved capacity

Stateful workloads 

Web applications

Stay safe!