AWS Auto Scaling: Scaling EC2, ECS, RDS, and More

What is AWS Auto Scaling?

AWS Auto Scaling is an Amazon service that lets you configure automatic scaling of AWS resources. It increases computing power or storage resources available for applications when loads increase, and reduces it when no longer needed. 

The AWS Auto Scaling Console provides a single user interface to use the auto scaling capabilities of various AWS services. AWS Auto Scaling can be used to scale Amazon Elastic Compute Cloud (EC2), EC2 Spot Fleet requests, Elastic Container Service (ECS), DynamoDB, and Amazon Aurora.

AWS Auto Scaling enables you to configure and manage scalability using scaling strategies—define how to optimize resource usage—preferring availability, cost, or a balance of the two. It is also possible to create custom scaling strategies.

You can also leverage scaling plans—these are policies that adjust resources using dynamic or predictive scaling. 

In this article, you will learn:

Autoscaling Services on AWS Cloud Platform

Let’s briefly review how AWS Auto Scaling can help you manage scalability for common AWS services.

EC2 Instance Auto Scaling

Helps you maintain the number of EC2 instances your application needs to handle incoming traffic requests. 

You can create EC2 auto-scaling groups, a collection of EC2 instances. Set a minimum scaling value so that the group is never smaller than the specified size (if an instance fails, it is replaced). Set a maximum number of EC2 instances and the group will not exceed the specified size.

In addition, you can:

  • Manually add or remove EC2 instances from auto scaling groups (this is called manual scaling) 
  • Change minimum or maximum capacity of a group on a predefined schedule
  • Set scaling plans that dynamically scale groups up and down (learn more about scaling plans below)

Learn more in our detailed guide to EC2 autoscaling

Amazon EC2 Spot Fleet Requests

A spot instance is an Amazon EC2 instance provided at a discount of up to 90%, because Amazon currently has spare capacity of this instance type in a specific availability zone. Spot instances can be interrupted with two minutes’ notice.

A Spot Fleet is a grouping of EC2 spot instances, based on custom criteria. Spot Fleets are created by spot fleet requests, which specify how much capacity is needed, how much of it should be made up of on-demand instances, which types of spot instances are required, and a maximum price.

There are two types of spot fleet requests:

  • Request—a one-time request for capacity, but if there are not enough spot instances that meet your criteria, you will get less capacity.
  • Maintain—this requires Spot Fleet to maintain a desired capacity over time.

AWS Auto Scaling can automatically adjust the capacity of a Spot Fleet, based on demand. It supports the following scaling policies:

  • Target tracking scaling—adjusts capacity of the spot fleet based on a load metric, such as CPU utilization. Adds or removes instances to ensure the load metric is maintained at the desired level.
  • Step scaling—adds or removes instances in steps, performing a scaling adjustment when a load metric reaches a certain threshold. 
  • Scheduled scaling—adjusts the number of instances at a predefined date and time.

Elastic Container Service (ECS) Auto Scaling

This can be triggered by CloudWatch metrics available for ECS containers, like CPU and memory usage. AWS Auto Scaling automatically increases or decreases capacity of ECS container tasks. To handle a large volume of incoming requests, use CloudWatch metrics to add more tasks, or remove tasks when loads decrease.

ECS auto scaling can also use scaling plans like step scaling and scheduled scaling (see scaling plans).

RDS Storage Auto Scaling

RDS auto scaling provides automated storage scaling for MySQL, PostgreSQL, MariaDB, SQL Server, and Oracle databases. RDS monitors database storage utilization, and when current usage is close to the provisioned size, it scales up storage capacity available to the database instance. 

Scaling events are performed with no downtime, without affecting current database operations or interfering with current transactions.

DynamoDB Auto Scaling

In DynamoDB database workloads, it is challenging to estimate required read and write capacity. Applications may require a high throughput for only a short time. DynamoDB Auto Scaling dynamically adjusts capacity based on actual inbound traffic patterns.

When workload throughput decreases, Auto Scaling automatically decreases the number of capacity units, avoiding payment for unneeded capacity.

DynamoDB Auto Scaling works by creating scaling policies for the table or secondary index. In the scaling policy, you can specify whether to extend read and/or write capacity, maximum and minimum provisioned capacity units, for the table or the index.

Using Scaling Plans in AWS Auto Scaling

Scaling plans are a key component of AWS Auto Scaling. It provides a set of instructions for scaling resources up and down. If you use AWS CloudFormation or add tags to your AWS resources, you can set up a different scaling plan for each group of resources. 

AWS Auto Scaling analyzes the behavior of each resource and provides recommendations for customized scaling strategies. After a scaling plan is created, Auto Scaling executes it by combining dynamic scaling and predictive scaling methods:

  • Dynamic scaling adapts capacity to actual loads to optimize resource utilization
  • Predictive scaling creates a forecast of future loads and performs scaling actions to meet expected load

AWS Auto Scaling with and without dynamic scaling

AWS predictive scaling

Source: Amazon Web Services

Below are several common options for scaling plans in AWS Auto Scaling.

Continue Existing Instance Levels

Configure Auto Scaling to maintain a specified number of instances indefinitely. Amazon EC2 Auto Scaling periodically scans instances to check their health. When an error is detected, the instance is terminated and a standby instance is started. This ensures the required number of instances is running.

Scale According to Fixed Schedule

You can schedule scaling to occur automatically on specific dates and times. This feature is especially useful in situations where you can accurately forecast demand. Instead of relying on predictive scaling, you manually determine how much capacity to allocate at a given time. This is useful when there are unusual, known spikes in demand, for example before a holiday sale.

Scale According to Demand

AWS Auto Scaling can scale resources according to actual application loads. Ensure you select a load metric that is representative of how your resources respond to loads—typically CPU or memory utilization are good metrics. When loads shift, Auto Scaling will increase or decrease resources to ensure the load metric stays at the same level. 

AWS Autoscaling with Elastigroup

Elastigroup provides AI-driven prediction of spot instance interruptions, and automated workload rebalancing with an optimal blend of spot, reserved and on-demand instances. It lets you leverage spot instances to reduce costs in AWS, even for production and mission-critical workloads, with low management overhead.

Key features of Elastigroup include:

  • Predictive rebalancing—identifies spot instance interruptions up to an hour in advance, allowing for graceful draining and workload placement on new instances, whether spot, reserved or on-demand.  
  • Advanced auto scaling—simplifies the process of defining scaling policies, identifying peak times, automatically scaling to ensure the right capacity in advance.
  • Optimized cost and performance—keeps your cluster running at the best possible performance while using the optimal mix of on-demand, spot and reserved instances. 
  • Enterprise-grade SLAs—constantly monitors and predicts spot instance behavior, capacity trends, pricing, and interruption rates. Acts in advance to add capacity whenever there is a risk of interruption.
  • Intelligent utilization of AWS Savings Plans and RIs—ensures that whenever there are unused reserved capacity resources, these will be used before spinning up new spot instances, driving maximum cost-efficiency.
  • Visibility—lets you visualize cluster activity and costs, with live views of potential and actual costs, resource utilization, and running instances. You can set budgets per cluster and receive notification alerts about budget deviations.
  • Application aware—matches scaling behavior to the type of workload, can add or remove servers from load balancers, use health checks to monitor health, and provide excess capacity for stateful applications without risking data integrity.

Learn more about Elastigroup