Understanding EC2 Auto Scaling Groups

What is an EC2 Auto Scaling Group?

An EC2 auto scaling group is a logical collection of several Amazon EC2 instances used for management and scaling purposes. Auto scaling groups let you use core features of the EC2 Auto Scaling service, including health checks, minimum/maximum instances and scaling policies.

EC2 Auto Scaling functionality mainly centers around maintaining a specified amount of instances in a group, or automatically increasing and decreasing the size of a group according to different aspects, such as application loads or predetermined schedules.

EC2 Auto Scaling is offered as part of the general AWS Auto Scaling service, which can help you scale multiple Amazon services, including ECS and RDS.

This is part of a series of articles about AWS Autoscaling

In this article, you will learn:

How Do Auto Scaling Groups Work?
Auto Scaling Groups and Availability Zones
EC2 Auto Scaling Groups with Multiple Instance Types
Tagging Auto Scaling Groups and Instances
Elastic Load Balancing and Auto Scaling Groups
EC2 Autoscaling with Spot Elastigroup

How Do Auto Scaling Groups Work?

The size of your auto scaling group is maintained according to a pre-defined number of instances, which you configure as the required capacity. You can use manual or automatic sizing to resize groups according to application requirements.

Initially, an auto scaling group launches enough instances to reach the required capacity. By default, it maintains this amount of instances by performing regular health checks, identifying unhealthy instances, terminating them and launching other, replacement instances.

Beyond this basic functionality, you can use a scaling policy to dynamically change group size, using policies for the following jobs:

Manual scaling
Maintain a fixed, pre-defined number of instances
Tracking a target for a specific load metric
Step scaling based on multiple thresholds of a load metric
Simple scaling (changes capacity by fixed increment)
Scaling jobs based on SQS queues
Scheduled scaling

When a policy is active, the auto scaling group changes the amount of instances dynamically, keeping the number between the maximum and minimum values defined for your group.

Auto scaling groups use launch templates to define which new instances will be launched. You can define a specific instance type, or several types of instances, in a launch template. You can also set different launch templates for different EC2 resources.

Learn more in our article about EC2 auto scaling.

Auto Scaling Groups and Availability Zones

You can span auto scaling groups across multiple availability zones (AZ) in a Region. The next step is to attach a load balancer, which distributes incoming traffic equally across all of your chosen AZs.

Once an AZ becomes unavailable or unhealthy, Auto Scaling launches and adds new instances to the AZ with the least amount of instances. If or when these attempts fail, Auto Scaling tries to launch these instances in other AZs and continues this process until it succeeds.

When you need to expand the availability of your application, you can add an AZ to your auto scaling group. Remember to enable the AZ for the relevant load balancer. After the new AZ is enabled, the load balancer starts routing traffic, in an equal manner, across all enabled AZs.

Note that while auto scaling groups can contain instances from multiple AZs, all AZs must be located in the same Region. This functionality does not support the use of multiple Regions.

Here are other limitations you might want to consider when choosing AZs:

When enabling an AZ for your load balancer, you need to define one subnet from that AZ. However, you can select a maximum of one subnet per each AZ.
Any subnets specified for Internet-facing load balancers must have at least eight available IP addresses.
Application load balancers need at least two enabled AZs.
It is not possible to disable any enabled AZs for network load balancers. What you can do is enable additional AZs.
Gateway load balancers do not support any changes to AZs or subnets that were added when the load balancer was created.

EC2 Auto Scaling Groups with Multiple Instance Types

Auto scaling groups let you launch a fleet composed of on-demand EC2 instances and spot instances. There are multiple ways to get discounted rates for instances while auto scaling:

Spot instances which are EC2 spare capacity offer a discount of up to 90% compared to on-demand instances.
You can also use reserved instances or Amazon savings plans within an auto scaling group, to get discounted rates for on-demand instances.

Mixing different types of instances also improves availability:

Deploying applications across multiple availability zones (AZ).
Scale an application using more than one instance type, which reduces the chance that the specific instance type you selected will not be available in your selected AZ.
By combining on-demand instances and spot instances there is always at least a base of available on-demand instances. For spot instances, if there is insufficient capacity, EC2 Auto Scaling looks to other spot instance pools, making it more likely you can use low-cost spot instances. If there are none available, those instances will not run until spot instances become available.

Learn more about mixing EC2 Autoscaling Groups and spot instances

Allocation Strategies

There are several allocation strategies used by auto scaling groups to add specific types of instances to a group:

On-demand instances—this allocation policy looks at the instance types defined in your launch template, in the order they are listed. It fulfills the on demand capacity of the auto scaling group by adding the first type of instance listed, and if it isn’t available, the second, third, etc.
Capacity-optimized spot instances—this allocation policy adds instances from a spot instance pool that are likely to provide optimal capacity over time. It looks at real-time data about spot instance capacity to predict which instances will be available for the longest time.
Lowest price spot instances—this allocation policy adds spot instances from a certain number of spot instance pools you define, or other pools which have a lower price at the time the group is scaled. You can define which instance types and which availability zones you want the spot instances to be drawn from.

Spot Instance Interruption and Rebalance

Even though spot instances can be mixed up with on-demand instances in an auto scaling group, the spot instances can still be interrupted. When a Spot instance is at high risk of interruption, EC2 provides an instance rebalance recommendation.

Amazon explicitly warns that it will not always send this recommendation in advance, and it may arrive together with the spot instance interruption notice, which gives you two minutes to move workloads off the instance before it is interrupted.

Once you receive a rebalance recommendation or an interruption notice, you can choose to:

Move your workload to another spot instance which is not at risk of interruption
Move the workload to a higher-cost on-demand instance

You can enable an option called Capacity Rebalancing for an EC2 auto scaling group. This means that Auto Scaling will automatically attempt to replace a spot instance that received a rebalance recommendation, with another instance that has not received this warning. Alternatively, you can create a lifecycle hook to perform any other action when the warning is received.

Learn more about predictive rebalancing for spot instances

Tagging Auto Scaling Groups and Instances

You can use tags to classify auto scaling groups—for example to indicate their purpose, the environment they run in, or their owner. You can add several tags to a single group, and specify that tags should also be applied to the individual EC2 instances inside the group. This can help break down instance costs in your EC2 bill.

Elastic Load Balancing and Auto Scaling Groups

Amazon Elastic Load Balancing (ELB) is used to automatically distribute incoming traffic to EC2 instances, so that an instance is not overloaded. You can register an auto scaling group with a load balancer. This allows load balancing to occur between all the members of the auto scaling group.

When an auto scaling group is attached to a load balancer, all incoming requests go to the load balancer, and it routes the traffic to one of the instances in the group. It is important to realize that instances added to or removed from the auto scaling group need to be explicitly registered or deregistered from the load balancer.

Once you’ve attached a load balancer to an auto scaling group, you can configure the group to scale based on ELB metrics, such as the number of requests per target. You can also use health checks performed by the load balancer to trigger scaling events.

You can use the following types of load balancers with EC2 auto scaling groups:

Application Load Balancer—supports routing and load balancing of HTTP / HTTPS traffic, as well routing based on path. Only supports virtual private cloud (VPC).
Classic Load Balancer—performs routing and load balancing of TCP/SSL or HTTP/HTTPS traffic. Supports both regular EC2 and VPC.
Network Load Balancer—routes traffic at TCP/UDP layer 4, based on TCP packet address information.
Gateway Load Balancer—distributes traffic to groups of instances or appliances like firewalls and introduction prevention/detection systems. Supports the GENEVE protocol which is used by many appliances.

Your load balancer and auto scaling group need to be in the same region. Additionally, your load balancer target must be an instance and not an ip, in order to connect to an auto scaling group.

EC2 Autoscaling with Spot Elastigroup

Elastigroup provides AI-driven prediction of spot instance interruptions, and automated workload rebalancing with an optimal blend of spot, reserved and on-demand instances. It lets you leverage spot instances to reduce costs in AWS, even for production and mission-critical workloads, with low management overhead.

Key features of Elastigroup include:

Predictive rebalancing—identifies spot instance interruptions up to an hour in advance, allowing for graceful draining and workload placement on new instances, whether spot, reserved or on-demand.
Advanced auto scaling—simplifies the process of defining scaling policies, identifying peak times, automatically scaling to ensure the right capacity in advance.
Optimized cost and performance—keeps your cluster running at the best possible performance while using the optimal mix of on-demand, spot and reserved instances.
Enterprise-grade SLAs—constantly monitors and predicts spot instance behavior, capacity trends, pricing, and interruption rates. Acts in advance to add capacity whenever there is a risk of interruption.
Intelligent utilization of AWS Savings Plans and RIs—ensures that whenever there are unused reserved capacity resources, these will be used before spinning up new spot instances, driving maximum cost-efficiency.
Visibility—lets you visualize cluster activity and costs, with live views of potential and actual costs, resource utilization, and running instances.
Application aware—matches scaling behavior to the type of workload, can add or remove servers from load balancers, use health checks to monitor health, and provide excess capacity for stateful applications without risking data integrity.

Learn more about Spot.io Elastigroup