How Duolingo Reduced Their ECS Costs by 65%
When organizations move their workloads to public cloud providers, they quickly realize how expensive it can be. One way that many users have tried to reduce costs are by utilizing reserved instances which is capacity sold with a big upfront cost and commitment for an overall reduced price in the end compared to on-demand. The problem with purchasing reserved instances is that some organizations do not utilize them effectively and find it hard to use them with other instance types as needed when scaling.
Duolingo made the switch to microservices and eventually ran their workloads on Amazon ECS managed by Terraform. During their transition to AWS ECS, they saw their costs rise and made an attempt to reduce it by combining the use of reserved instances and EC2 spot instances. As time went on, Duolingo realized they are not using reserved instances effectively, because they were not able to prioritize using them before utilizing spot instances. At the end of the day, they were left with unutilized reserved instances which is money wasted.
Different workloads require different compute resources. Some are optimized for memory, CPU, network, or a combination of all three. One way to reduce costs is by combining different instances types and sizes to try and make better use of infrastructure. However, on ECS, Duolingo found out the hard way that they were not able to do this. As a result, they were paying more.
Duolingo began their search for a solution that can help them reduce costs, mix instance types and sizes, utilize reserved instances more efficiently, and work with Terraform to simplify deploying their workloads on AWS.
Duolingo decided to try Elastigroup by Spot and see if it can help them utilize their reserved instances more efficiently. Elastigroup automatically finds all unutilized reservations and prioritizes reservations usage prior to launching spot instances, making sure that applications are running on the best possible mix of instances.
When creating a new Elastigroup, users have the ability to specify a diverse set of instance types and sizes to use. Specifying more instance types means that Elastigroup will have more flexibility in choosing the most cost-effective and reliable spot instance to use. This is all possible through the Spot Market scoring features of Elastigroup.
Spot Instance Market scoring helped Duolingo choose the best spot instance markets by providing a visual aid that shows the number of separate spot instance markets available based on the number of Availability Zones and spot instance types selected. When using multiple instance types with multiple availability zones, they were presented with more saving possibilities from the Spot Market.
Duolingo has over 300 million users that complete over 7 billion language exercises done each month. This means that they need to be able to scale quickly and autonomously. After they imported their existing ECS deployment into Elastigroup, they were able to reduce costs by running their ECS workloads on spot instances with a mix of different instance types and was able to automate infrastructure scaling with Elastigroup’s ECS Autoscaler.
The Elastigroup ECS Autoscaler dynamically scales the cluster up and down to ensure there are always sufficient resources to run all tasks. This is done by optimizing task placement across the cluster in a process we call Tetris Scaling, and by automatically managing Headroom – a buffer of spare capacity (memory and CPU) that makes sure you can scale more containers without having to wait for new instances to be provisioned.
Duolingo uses Terraform to deploy their infrastructure and applications and did not want to change their workflow because it can be time-consuming and costly. With the Elastigroup support for Terraform via a plugin, Duolingo was able to deploy their workloads onto spot instances and reduce costs by efficiently managing their reserved and spot instances without changing their workflows.
After using Elastigroup in production, Duolingo was able to reduce their compute costs on AWS by 65%. These great savings were made possible by Elastigroup’s ability to effectively utilize reserved and spot instances together. More savings was possible by Elastigroup intelligently choosing the most cost-effective and reliable instance type and size for their workload. With the Elastigroup ECS Autoscaler, Duolingo was able to simplify operations and take a hands-off approach to managing their infrastructure. Most importantly, they did not need to make major changes to their existing workflow because Elastigroup worked natively with Terraform.