A Technical Introduction to Elastigroup: Automating Cloud Infrastructure for Elastic Applications

With the growing adoption of cloud computing, companies of all sizes are devoting increasing amounts of resources to the management of their cloud infrastructure. As business demands scale, companies tend to find their cloud bills growing surprisingly fast. This makes the optimization of cloud management and costs into a necessity and no longer just a “nice to have”.

The core building blocks used to deploy and scale infrastructure for autoscaling applications on the 3 major cloud providers are AWS Auto Scaling Groups (ASGs), GCP Instance groups and Azure Scale sets. They are designed to have all the features needed to run highly available applications, and yet their configuration and maintenance processes are not always optimized to help minimize the effort or costs involved.

When it comes to cost savings, one of the first places to look to are the discounted pricing models offered by the cloud providers themselves. These include spare capacity based offerings (such as AWS EC2 spot instances, Azure spot instances and GCP Preemptible VMs), or commitment based discounts (AWS reserved instances and savings plans, for example).

Each of these options, though, comes with its own caveats. The resulting challenge for modern cloud ops teams would be using ASGs (or their counterparts), while navigating multiple pricing options and maintaining workload high-availability and deployment processes that are as automated and efficient as possible.

Introducing Elastigroup

Elastigroup by Spot is a cloud infrastructure automation service in which the user can provision, manage and scale compute instances to support any elastic application or load balanced workload, on top of spare and reserved cloud capacity, without compromising availability. With an enterprise grade SLA, even mission critical and production workloads can benefit from up to 90% cost savings, with guaranteed peace of mind.

Elastigroup provides every feature you’d expect from a standard cluster management platform, but raises the bar with predictive approaches to instance selection, auto-recovery and autoscaling, along with comprehensive dashboarding and advanced automation.

Moreover, Elastigroup integrates with the most popular services and provisioning tools, which means customers don’t need to redesign their architecture, or change existing practices to leverage its power.

How does it work?

Elastigroup connects to your cloud account with a set of permissions (see Elastigroup’s AWS policy, for example) that enables the launching and management of cloud resources from within the platform.

You can import existing AWS ASG, Azure Scale Set or GCP Instance Group workloads directly from your cloud account, or create a new Elastigroup from scratch, with the process taking just a few minutes.

Once Elastigroup manages your workload, it ensures its availability while continuously optimizing costs. This is achieved through a combination of technologies:

Spot instance market analysis

With a wide dataset, accumulated by running over 5 billion compute hours per month on top of spot instances, Elastigroup’s machine learning algorithms assign each “spot instance market” or pool (a combination of a compute instance type, size, OS and AZ) an individual market score. This score is visible to all customers and empowers decision making when initially configuring a group. Additionally, based on this score Elastigroup will choose the best instance type to launch when scaling up, or which instance to terminate when scaling down (while also taking into account other parameters such as price or user-defined preferences).

Proactive replacement

Elastigroup further leverages its predictive rebalancing ML algorithms to analyze the data about spot instance market behavior and predict spot instance interruptions up to an hour ahead of of AWS’s 2 minute notification. This allows for a graceful draining of at-risk instances, migrating them to instances with greater longevity and maintaining the desired capacity.

Proactive replacement of spot instances to avoid interruptions and outages

Fallback to on-demand

To further complement its analytical and predictive abilities, Elastigroup can identify a situation where no spare capacity is available for pre-defined markets and will fall back to on-demand instances to meet target resource requirements. That means that even when no spot instances can be launched, your workload will remain up and running. When spare capacity is available again, Elastigroup will revert to it automatically, maximizing cost optimization with no effort on your part.

Intelligent reserved capacity utilization

Elastigroup continuously scans your cloud account for any available reservations matching its defined instance types. When those are available, it will use those, since the reservations have already been paid for. When similar on-demand workloads are launched elsewhere in your account (but not within Elastigroup), the relevant workloads will be moved back to spot instances, so the reservations can be applied to the other matching workloads. This effectively optimizes spend across your cloud accounts in all situations.

Auto healing

Elastigroup can assess the health of the instances it manages based on preconfigured health checks, and replace any unhealthy instances to support service availability. Health checks can be based on cloud provider criteria, third party integrations, or can even be custom defined.

Stateful support

In addition to stateless and autoscaling use cases, Elastigroup has capabilities which enable the management of stateful, single instance workloads on top of spot instances. Any fault tolerant use case can benefit from the cost savings of spare capacity without losing a byte of data. Instance resources such as root volume, data volume and IP address, are automatically backed up and migrated to new machines upon replacement.

Getting started with Elastigroup

If you ever considered leveraging EC2 spot instances by yourself, you will find that Elastigroup by Spot is the ideal, turnkey solution for your needs, with a vast number of satisfied customers optimizing their cloud.

Here are some of the factors that make Elastigroup a solution which is easy to deploy and integrate with your existing cloud environment:

Third party service integrations. Elastigroup supports a wide variety of use cases. It seamlessly integrates with cloud provider and third party services such as: Load Balancing, Elastic Beanstalk, CodeDeploy, AWS Batch, Docker Swarm, EMR, Jenkins, Chef, GitLab and more.

Provisioning tools & SDKs. Spot is partnered with a variety of solution providers to make Elastigroup and its other products as accessible as possible. The list of supported tools includes: Terraform, Cloudformation, Ansible and a Python SDK, among others.

Setup automation. Migrating existing workloads is as easy as can be, with tools to assess your existing infrastructure provided right within the Spot platform. Customers can leverage Spot’s Cloud Analyzer to scan their accounts, identify workloads that can be optimized with the help of Elastigroup, and import them with the click of a button.

The Spot platform is backed by a dedicated team of cloud engineers and solution architects. Combined with a 24/7 support service, and backed by clear SLAs, customers can reliably leverage the platform to optimize their cloud environment.

If you want to quickly learn how to get started with Elastigroup, please see one of our video tutorials. For technical documentation and APIs, visit our documentation portal.