Understanding Excess Cloud Capacity: Amazon EC2 Spot Instances vs. Azure Spot VMs vs. Google Spot VMs

May 5, 2023

5 min read

Amazon Web Services Google Cloud Microsoft Azure Spot Instances

Understanding Excess Cloud Capacity blog post banner

Spot instances (also called spot virtual machines, or VMs) are transforming how people consume public cloud services. Cloud providers offer their excess cloud capacity as these short-lived instances for a very low cost compared to on-demand or reserved instances.

The top three cloud providers — Amazon Web Services (AWS), Microsoft Azure, and Google Cloud — offer spot instances to their users. Before provisioning these resources, it’s vital to compare the spot instances offered by these cloud providers and evaluate their strengths and weaknesses, the kinds of workloads they support, their pricing models, and other factors.

Why use spot instances/spot VMs?

Cloud providers use spot market as a way to monetize their excess cloud capacity. Rather than let the instances go to waste, they can sell them at a massive discount to drive usage. The price of spot instances varies with the supply and demand, but, on average, users can save on average up to 90% compared to the cost of on-demand instances.

While the three major cloud providers do have similar offerings for their excess capacity, there are key differences in terms of the benefits and use cases.

AWS EC2 Spot Instances

Since 2009, AWS has offered Amazon EC2 Spot Instances based on their excess capacity in a spot market. These instances are available at discounts of up to 90% when compared with AWS on-demand instance pricing.

Even though AWS offered spot instances through a bidding model in the early days, they have since sold them based on available excess cloud capacity in their inventory and on a first-come, first-serve basis. (The bidding model is now optional.) EC2 spot instances can be used along with other AWS services like EMR, Auto Scaling, Elastic Container Service (ECS), Data Pipeline, and AWS Batch.

Strengths

Two-minute advance notice on spot instance removal, giving time for users to gracefully shut down or fall back to other instances
No time limit on the life of the spot instance
Offers Spot Fleet as a way to orchestrate and manage spot instances along with on-demand instances based on a target price or target distribution across pools
Spot Instance Advisor allows users to determine where there will be the least disruption across regions

Weaknesses

Spot Instances are still not “application aware,” limiting the support for certain transient use cases
AWS cannot commit for consistent availability of this type of EC2 capacity

Considerations

AWS EC2 Spot instances are useful for various fault-tolerant and flexible applications, such as big data, containerized workloads, high-performance computing (HPC), stateless web servers, rendering, CI/CD, and other test & development workloads. With Spot Fleet, you could use automation scripts to move workloads to other available instances (including on-demand instances) for long-running workloads to exist beyond the life of the spot instances. However, this adds additional operational overhead with risks for failures. However, with the right automation and analytics and by leveraging a mixture of spot instances along with on-demand and reserved instances, it is possible to run varied and mission-critical workloads.

Azure Spot VMs

Azure Spot VMs are offered from the excess capacity in the Azure data centers. Like AWS and Google, Azure Spot VMs also boast a discount of up to 90% versus on-demand pricing, depending upon region, VM type, and capacity available when the workload is deployed.

Azure Spot VMs are available as a part of Azure Batch and VM scale sets. These instances are useful for batch processing workloads such as media rendering and other large processing jobs, certain dev/test workloads, and demos.

Strengths

Fixed pricing offering some predictability in costs
Can work natively inside the Azure batch pools with on-demand instances
Unlike earlier Low Priority VMs, Azure Spot VMs are available in both VM scale sets and as single VMs

Weaknesses

There are issues with scaling to hundreds of VMs, and often, some VMs are not available
Low visibility into information on the resource availability, which makes porting and planning of VMs difficult
Limited integration to other Azure services, limiting developers from using these instances along with other Azure services

Considerations

Azure Spot VMs are useful in certain use cases, but they are not available for all types of instances and workloads. Azure recommends using them for dev/test environments, big data applications, stateless workloads, batch jobs, and high-performance computing scenarios. Since Azure Spot VMs are limited in their features and supported use cases, it is important to consider how these VMs can be leveraged to run other types of workloads using automation.

Google Cloud Spot VMs

Spot VMs offered by Google Cloud are short-lived, low-cost virtual machines that can help users run batch jobs, fault-tolerant workloads, or other short-lived workloads. They are similar to Amazon EC2 Spot Instances and Azure Spot VMs but with some important differences.

With the fixed price spot VMs, you can save up to 91% versus Google’s on-demand instance pricing. You can also use Spot VMs to run containerized workloads and node pools in Google Kubernetes Engine (GKE).

Strengths

Fixed discount and pricing with no uncertainties on the cost, and you only pay for instances used
No limitation on the instance type
Provisioning Spot VMs is easier and involves just an addition of a bit in the command line
Unlike Google’s earlier pre-emptible VMs, Spot VMs do not have a minimum or maximum runtime unless you specifically limit the runtime

Weaknesses

Only 30 seconds notification before removal of the instance. Though this is enough for a graceful shutdown, it may be limited in certain failover use cases.
According to internal research and statical information, certain instance types will be interrupted after less than six hours. This is causing major concern to run production workloads on top of Spot VMs, even though its stateless web servers/container.

Considerations

Google Spot VMs are suitable for batch jobs and other fault-tolerant workloads and, on their own, may not be applicable for other types of workloads. Organizations must be careful before considering Spot VMs for production or mission critical environments. They are short-lived and add additional operational overhead to handle production and mission critical workloads. To leverage Spot VMs for such workloads, organizations should consider using platforms that use automation and analytics to provide SLAs consistent with these workloads.

Make the most of excess cloud capacity discounts

Every major cloud provider offers spot instances with a varying set of features. These short-lived instances have limited applicability and require additional operational overhead to extend their use beyond these use cases.

Enterprises are faced with using on-demand and reserved instances for many of their workloads. However, this is a suboptimal way to make the most of everything cloud infrastructure has to offer. With automation and analytics, organizations can use a mix of spot instances, on-demand instances, and reserved instances to run many workloads with guaranteed SLAs.

Want to scale mission-critical workloads using spot instances? Run stateless and stateful workloads in AWS, Azure, or Google Cloud and save money on your cloud compute with Elastigroup.

Request an Elastigroup demo.