What are Spot Instances?
Spot instances are an excellent way to significantly reduce your EC2 cost, by up to 90%. In the past few years, increasing numbers of companies, from SMBs to enterprises, have been leveraging spot instances for even mission-critical and production workloads. This has helped them greatly optimize their cloud costs.
Read on to find out how you too can benefit from spot instances.
The basics of EC2 pricing
AWS offers four primary pricing models to pay for and use EC2 (Elastic Compute Cloud) instances. It’s important to understand all of them, as an optimized cloud portfolio will benefit from using a balanced blend of all pricing models.
Note: No matter which pricing model you select, there is absolutely no difference in the actual compute resource you receive (i.e. the underlying, virtual machines will be the same). The only difference is the amount you pay and the corresponding level of cloud compute availability that you receive.
This option offers up to 90% cost reduction when compared to on-demand pricing. These spot instances represent AWS’s excess capacity which as a cloud provider, they absolutely need to have available for any surges in customer demand. To offset the loss of idle infrastructure, AWS offers this excess capacity at a massive discount to drive usage. However, this discounted pricing comes with the caveat that AWS can “pull the plug” and terminate spot instances with just a 2 minute warning. These interruptions occur when AWS needs to draw from the excess capacity to service customers who purchased reserved instances, savings plans or on-demand instances (see below). Of course, sudden interruption of EC2 instances can result in data loss, service degradation, unavailable services and the like, making spot instances a challenge for mission-critical, production workloads. Read more about spot instances on the AWS website.
This option is essentially pay-as-you-go, allowing you to spin up and down EC2 instances at will, provided that the instance type you want is available when you want it. EC2 on-demand is the most expensive option of the three. Read more about on-demand instances on the AWS website.
This option consists of an upfront financial commitment for 1 or 3 years of EC2 usage which in turn provides users with guaranteed capacity for the instance type selected. Savings compared to on-demand are roughly in the 75% range. However, reserved instances create financial lock-in, so if you don’t use what you committed to, you could potentially end up with a negative ROI. Read more about reserved instances on the AWS website.
This option is similar to reserved instances in the commitment terms of 1 or 3 years, but does not require you to select a specific instance and rather can be applied to any EC2 instance (as well as other AWS services). For example, you can commit to spend a desired amount per hour, e.g. $35/hour, for either 1 or 3 years. Anything spent up to $35 will be charged in accordance with Savings Plans rates (between 66-72% savings). Any spend above the committed amount will be charged at On-Demand rates. Read more about savings plans on the AWS website. You can also read more about savings plans and reserved instances on the Cloud Academy blog.
When should you use spot instances
The general perception of spot instances is that they are ideal for web services, containerized applications or other stateless, fault-tolerant workloads. However, in reality, they can also be used for a much broader set of use cases, without any significant impact on availability or performance. Here are some examples:
- Stateful applications typically require data and IP persistence. With automated solutions, even in the event of spot instance replacement, your workload will immediately restart in the desired Availability Zone, from the same exact data point, maintaining root and data volumes as well as private and public IPs.
- Machine Learning is another area in which deep learning and training progress can be negatively impacted by unplanned spot instance interruptions. But with the right tools, you can successfully run all your ML projects on spot instances.
- CI/CD operations, whether Jenkins, Chef, Gitlab or others, can be run at scale on spot instances quite easily.
- Big Data running on AWS EMR, Hadoop or Spark are great candidates for spot instances.
- Distributed DBs such as Elasticsearch, Cassandra, Mongo which can handle a “reboot” of a single instance without losing data or affect service, can also run on spot instances.
The evolution of the AWS spot instance market
In the early days of spot instances, the price was determined by the actual real-time bids and available spot instances. Say for example there were 5 available spot instances, with 6 bidders (each bidding for a single instance). The 5 top bidders would get their spot instance, with the price determined by the lowest bid out of the top 5 bids. The 6th lowest bidder would then lose his spot instance. Of course, there were other variables, but this example illustrates how in the past, you could avoid interruptions by bidding well above the on-demand rate so you’d likely remain in the group of top bidders (back then you could bid up to 10x the on-demand rate).
Today, the price is set by AWS based on “long-term trends in supply and demand for spot instance capacity”. Additionally, the maximum price you can pay for a spot instance is its on-demand rate. So you don’t need to worry about losing your instance due to being outbid. However, one of the consequences of this is that AWS will randomly select the spot instances to terminate in response to surges in market need, for on-demand or reserved capacity. As such, playing the bidding game no longer provides any protection against spot instance interruption.
On the other hand, with spot instance markets or pools being based on the instance type, size, OS and availability zone, you have a huge number of potential spot instance markets to run your workload in, each with their own unique and ever-changing pattern of interruptions.
How to check and select pricing for spot instances
You can find spot instance pricing on AWS’s spot instance pricing page as well as on the spot instance advisor page. This will help you determine the savings you can achieve in comparison to on-demand.
Regarding the selection of your spot instance price, there is no real benefit to bidding higher than the default, on-demand price.
This is true because:
- AWS determines the actual price based on overall market trends, not on specific, real-time bids
- Spot instance interruptions are randomly determined by AWS without any relation to your bid price
Therefore, you can safely leave maximum price at whatever default AWS has it set to.
If however, you only wish to spend a very specific amount, whether below the on-demand rate, or even below the current spot instance rate, you can check out historical spot instance prices, and specify your desired price. That way you will only run your EC2 spot instance when the actual market price matches your specific bid or is lower.
Managing spot instances DIY (do-it-yourself)
While AWS Spot Fleet enables you to manage a large group or fleet of spot instances with different allocation strategies (i.e. lowest price, diversified, capacity optimized, etc.) along with many other options, to make it all work well, requires a large amount of manual configuration, setup and maintenance.
If you are looking for a turn-key solution to move more of your workloads to spot instances, with greater ease and confidence, here is a checklist of value-added functionality that you can get with Spot.io.
|All||SLA for availability||None||99.99% availability|
|Containerized workloads||Container-driven autoscaling and bin-packing||Requires significant configuration. Also, requires multiple ASGs to accommodate instance size diversification.||Turn-key solution with optimized bin-packing, built in support for variable and dynamic instance sizes/types/life cycles, container autoscaler based on Pod/Task requirements.|
|Stateful workloads||Storage persistence||EBS volumes can be saved. Re-attachment to replacement instance is possible only when capacity is available in the same market, and provided the spot request is defined as “persistent”, or the spot fleet has “maintain” enabled.||Proactive identification of termination allows for reliable and automatic re-attachment of EBS volumes (same state) to replacement instances across instance types and sizes, or even AZs.|
|All||Graceful draining||With just a 2 minute warning of spot instance termination, applications and services might be interrupted mid-process.||Early prediction of spot instance termination allows for graceful draining and automatic workload relocation to new instance(s).|
|All||Automatic fallback to on-demand||Not supported||Fully automated for scenarios where there are no available spot instances.|
|All||Automatic return from on-demand to spot instance(s)||Not supported||Workload will be automatically moved back from on-demand as soon as appropriate spot instance type is available.|
|Containerized workloads||Cloud-native cost allocation for Kubernetes and ECS||Not supported||Cost allocation and showback at the container level by namespaces, resources, labels and annotations.|
|All||Proactive usage of available reservations and savings plans||Not supported||Workloads will always be prioritized to run on available savings plans & reserved instances, and will revert to spot instances for increased savings when applicable.|
|Containerized workloads||Vertical container rightsizing||Requires additional collection of metrics and manual analysis.||Real-time measurement of Pods’ and Tasks’ CPU and Memory consumption informs requirements for cost-efficient cluster deployments.|
|Containerized workloads||Customizable buffer of spare nodes for workloads that cannot wait for scaling||Not supported||Fully supported|
|Containerized workloads||Centralized management of multiple node groups||Requires management of multiple autoscaling groups, one per node group.||Single point of management for multiple worker node groups, each with their own launch specifications.|
|Containerized workloads||Declarative infrastructure||Node Lifecycle control for Pods requires manual configuration of Labels & Taints on each Node Group as well as matching tolerations on the pods.||Simply declare infrastructure requirements from the Pod specifications by using a single label.|
|Autoscaling workloads||Instance auto-recovery||AWS provides retroactive recovery, after spot instance termination (2 min advance notice) only with “Maintain” status, and depending on availability.||Proactive detection of spot instance termination triggers deployment of replacement instances, with recovery to different markets as relevant.|
|Stateful workloads||IP persistence||Supported only if the instance or fleet is defined as “persistent” or “maintain” respectively.||Fully Supported, across spot instance markets.|
|All||Preferred compute pool & network subnet prioritization||A structured, hierarchal priority list can be configured, but will follow the exact, defined order even when less than optimal.||Able to prioritize AZs and Instance Types within spot instance allocation strategy, to dynamically match workloads with optimal resources.|
|All||Support for AWS services and 3rd party integrations||Available via Auto Scaling Groups (ECS, EKS, Beanstalk).||Available with Beanstalk, EMR, CodeDeploy, OpsWorks, ELB/ALB, Route53, Chef, Jenkins, GitLab, Rancher, Docker Swarm, RightScale, D2iQ as well as automatically generated templates for Terraform, CloudFormation, and JSON|
|Stateful workloads||Instance auto-recovery||2 minutes notification before spot instance interruption, restricts duration of any shutdown processes. Additionally, only recovers to the same spot instance market.||Advanced prediction of interruption, allows for graceful termination and recovery to alternative spot instance types and even on-demand, ensures highest availability|