Stateful Applications with Spot Instances

Reading Time: 4 minutes


The concept of data integrity and consistency is crucial when managing workloads. This aspect may be trivial when running with On-Demand instances, but it’s not so trivial while working with EC2 Spot Instances, which are conceptually ephemeral and can be revoked at any given moment. At Spotinst, we gave it some deep thought, on how you can leverage Spot, but still handle data concerns easily and with confidence.


Using Elastigroup, you can specify whether your Spot Instances should restart, or terminate when they are interrupted. You can choose the interruption behavior that meets your needs. The default is to terminate Spot Instances when they are interrupted. To change the interruption behavior, choose an option from the Stateful Configuration in the console or strategy.persistence in the API request.

  • Persist root and data volumes – simulates a restart of the node as part of instance replacement, keeps the same root volume disk, so your application can start right where they left off.
  • Persist private IP addresses – keeps your instances ENI’s, your instance will be brought up after a Spot Interruption with the same IP address and ENI.

Multi-AZ and Spot to On-Demand Recovery

In case of a Spot Intturuption, Elastigroup will try to restart or launch your Spot Instance on a different Availability Zone, Instance Type or even on a different pricing model such as On-Demand to maintain your cluster availability. screen-shot-2017-07-12-at-8-57-15-pm

Please note:  

  • Multi-AZ environments will have a snapshotting mechanism to make the volume available to all AZs configured in the Elastigroup. (as stated here: Hot EBS Migration)

Automatic Scheduled Snapshots

Elastigroup allows you to create automatic, scheduled snapshots of your AMI and attached EBS volumes. With the Auto Backup feature, you can maintain data persistence within your cluster. In the case of any instance replacement, Elastigroup will use the last snapshot recorded according to the defined interval.

If you customized your instance with EBS volumes in addition to the root device volume, the new AMI contains block device mapping information for those volumes. When the instance is launched from this new AMI, it will automatically launch with those additional volumes.

If you have an application for periodic changes or updates to the AMI and root volume this is the most complete solution as it simply creates new Images based on your desired frequency. This is a great option for application server clusters and for clusters running behind a Load Balancer

Please note:  

  • AMI back will be taken from a single instance of a group.
  • This is a great solution for Autoscaling groups.

Use Cases

Please note

  • The stateful configuration will work best with Shard-based clusters with a replication factor greater than 1.
  • Verify that you can tolerate an instance being removed from the cluster for maintenance. During a spot interruption, there will be a ‘restart’.

Elasticsearch node recovery will take a fraction of the time required to provision a brand new instance. From the standpoint of your Elasticsearch cluster the instance was only down for a maintenance restart  (depending on the size of the data volumes attached). No changes are necessary for your cluster to provision this as long as you have enough instances for a Quorum. 


If your Cassandra node is replaced we’ll clone the instance and bring it back. Your Cassandra cluster will behave as if the instance was down for some time. Bringing up a clone of the previous instance ensures that cluster IOPs are not wasted on bringing a new instance up.

Single Server Database

for non-production environments where do not have a requirement for 100% uptime for your database instances. For production, we recommend running with a Slavemaster configuration. Running the Master with on-Demand instances and the Slave on a Stateful Spot instance. 

Hadoop cluster

Support for “Stateful Spot” instances in Spotinst Elastigroups allows you to provision Spot Instances and automatically recover the full state of the instance including the private ip. When a recovery occurs we will automatically create a clone of the previous instance and it will appear as if the instance was brought down for a restart. For instructions please see: Hadoop use case


Kafka’s architecture is designed based on several components and each component has his unique role. All of these components can run on Spot Instances. Brokers and ZooKeeper clusters, as well as the consumers, can run seamlessly on Spot Instances.