EC2 Autoscaling: The Basics and 4 Best Practices

What is EC2 Auto Scaling?

Traditional IT environments are limited, using a specific number of servers to handle loads for any given application. When the amount of requests increases, so does the load on the server. Eventually, the demand on the load causes degraded performance and failure. Amazon Elastic Compute Cloud (EC2) provides an Auto Scaling service that overcomes this challenge. 

Auto Scaling makes sure there are enough EC2 instances to run applications. Before the service can run, you define auto scaling groups. For each group, you specify a minimal or maximum number of EC2 instances. Auto Scaling then detects if there is an error or failure on an instance, and immediately launches another instance to maintain the required capacity.

Amazon EC2 also offers dynamic auto scaling policies, based on load metrics, CloudWatch alarms, events from other Amazon services such as SQS, or a fixed schedule.

EC2 Auto Scaling is part of the AWS Auto Scaling service, which provides automatic scalability for several Amazon services.

In this article, you will learn:

EC2 Auto Scaling Components

There are three key components involved in EC2 Auto Scaling:

Auto scaling groups

Groups organize EC2 instances into logical units, used for scaling or management purposes. When creating an auto scaling group, you can specify the minimum, maximum, and preferred number of EC2 instances you need.

Learn more in our guide to EC2 auto scaling groups

Launch Templates

The launch template is a new way to configure auto scaling, replacing launch configurations, which are still supported as a legacy option.

Launch templates specify configuration information for new instances created in an auto scaling group. This includes the Amazon Machine Image (AMI) to use when creating the instance, security groups, and key pair. 

You can use versioning to create a subset of the set of parameters and reuse it to create additional launch templates. You can, for example, create a default template that specifies common configuration values, and programmatically insert different values that create new versions of the template.

Scaling options

EC2 Auto Scaling provides several ways to scale an instance group:

  • Manual scaling—attaching or detaching instances to the auto scaling group.
  • Maintaining a defined number of instances—scaled according to your specifications for minimum, maximum, and preferred or desired number of instances.
  • Target tracking—enables dynamic scaling according to a specified load metric target value.
  • Step scaling policies—specify several thresholds of a certain metric, and perform a scaling job when each threshold is reached.
  • Simple scaling policies—decrease and increase the capacity of the group by a specific instance number or percentage.
  • Scaling based on SQS—scaling up a group based on load in an SQS queue.
  • Scheduled scaling—performing a scaling event during specific dates and times.

How Does EC2 Auto Scaling Work?

The EC2 instance in an autoscale group has a different lifecycle than other EC2 instances. The lifecycle begins when the auto scaling group launches instances, or an instance is manually added to a group. The lifecycle ends when an instance ends or the group removes an instance and terminates it.


AWS Auto Scaling flowchart

Source: AWS

Scale Out

Several events, known as “scale out events”, initiate a process that tell the auto scaling group it should launch new compute instances and add them to the group:

  • Group size is manually increased
  • A scaling policy is active, which automatically increases group size when a policy criterion was met
  • A scheduled scaling event was set

When one of these events happens, the auto scaling group creates new instances, using the group’s launch configuration. New instances are initially launched in Pending status, and you can add lifecycle hooks to automatically perform an action when they are created.

Instances in Service

After an instance is created and any lifecycle hooks are executed, it enters the InService status. It remain in this state until any one of the below events occur:

  • A “scale in” event that causes the scaling group to terminate the instance, in order to reduce its size
  • A user manually puts the instance into Standby
  • A user manually detaches the instance from the group
  • The instance fails a health check several times and is removed from the group, destroyed, and replaced by a new instance

Scale In

The following “scale in” events cause an auto scaling group to remove an instance from the group and destroy it:

  • A user reduces the size of the group
  • A scaling policy automatically reduces the size of the group, when a certain criterion is met
  • A scheduled event was defined to scale down the group at a given time

Be sure to define a scale-in event for every scale-out event—to prevent unchecked scaling and instance sprawl.

4 AWS EC2 Auto Scaling Best Practices

Here are several best practices that can help you manage EC2 scaling more effectively. 

EC2 Instance Frequency

Ensure Amazon EC2 Auto Scaling is defined on load metrics that have a frequency of one minute. This enables a faster response to changes in application usage. Using a scaling metric with frequency of five minutes slows response time, and can result in scaling events based on old data. 

By default, EC2 provides basic monitoring, which tracks metrics every 5 minutes. For Auto Scaling based on EC2 metrics, it is recommended to enable detailed monitoring, which updates metrics every minute. Note this incurs an additional charge.

Auto Scaling Group Health Check

Make sure that the health check feature is configured correctly to detect that EC2 instances registered with an auto scaling group are functioning normally. Otherwise an auto scaling group cannot perform basic functions like removing and replacing failed instances.

If you are using Amazon Elastic Load Balancer (ELB) to distribute traffic between instances in an auto scaling group, make sure that ELB health checks are enabled (this works at the hypervisor and application level).

Predictive Scaling Forecast

Predictive scaling uses workload forecasting to plan future capacity. Predictions will be of higher quality if workloads have a cyclical performance pattern. Try running predictive scaling in “forecast only” mode, to evaluate the quality of the predictions and scaling actions the policy generates. If you are satisfied with the predictions, set the policy to “forecast and scale”.

Auto Scaling Group Notifications

If you don’t have any other monitoring mechanism for auto scaling, make sure your auto scaling group is configured to send email notifications upon scale out or scale in events. When notifications are enabled, an AWS SNS topic associated with the auto scaling group receives scaling events and sends notifications of scaling events to the email address you specified during the setup process.

EC2 Autoscaling with Elastigroup

Elastigroup provides AI-driven prediction of spot instance interruptions, and automated workload rebalancing with an optimal blend of spot, reserved and on-demand instances. It lets you leverage spot instances to reduce costs in AWS, even for production and mission-critical workloads, with low management overhead.

Key features of Elastigroup include:

  • Predictive rebalancing—identifies spot instance interruptions up to an hour in advance, allowing for graceful draining and workload placement on new instances, whether spot, reserved or on-demand.  
  • Advanced auto scaling—simplifies the process of defining scaling policies, identifying peak times, automatically scaling to ensure the right capacity in advance.
  • Optimized cost and performance—keeps your cluster running at the best possible performance while using the optimal mix of on-demand, spot and reserved instances. 
  • Enterprise-grade SLAs—constantly monitors and predicts spot instance behavior, capacity trends, pricing, and interruption rates. Acts in advance to add capacity whenever there is a risk of interruption.
  • Intelligent utilization of AWS Savings Plans and RIs—ensures that whenever there are unused reserved capacity resources, these will be used before spinning up new spot instances, driving maximum cost-efficiency.
  • Visibility—lets you visualize cluster activity and costs, with live views of potential and actual costs, resource utilization, and running instances. You can set budgets per cluster and receive notification alerts about budget deviations.
  • Application aware—matches scaling behavior to the type of workload, can add or remove servers from load balancers, use health checks to monitor health, and provide excess capacity for stateful applications without risking data integrity.

Learn more about Elastigroup