PubNative Decreases Cost By 80% Using Elastigroup by Spot For Their Mesos, Spark, ElasticSearch And Presto Clusters
A diverse Apache Mesos environment consisting of Chronos, Elasticsearch, Kibana, Logstash, Spark running on AWS spot instances
PubNative’s clients are dispersed globally and need real-time data analytics. This requires complex orchestration across multiple AWS regions and accounts. Over time, this resulted in additional management overhead and wasted compute capacity.
“By building our platform on Apache Mesos we were able to achieve a decent amount of optimization on our instances, but we found that we were still wasting compute capacity by utilizing homogeneous EC2 clusters. If there is a small spike in demand it makes more sense to spin up a smaller instance size (less vCPUs / memory) than a larger one. This is very difficult to manage without some sort of automation.”
Amir Friedman, Director of Engineering at PubNative
As growth exploded so did the infrastructure costs. PubNative tried to manage these costs by utilizing spot instances for their Mesos clusters. They quickly found that the cost benefits came with management challenges as well. “We tried managing the purchasing of spot instances on our own, but we quickly found that the integration with AWS Opsworks and Beanstalk was non-existent. After attempting to build our own tooling for managing the Spot Market, we quickly found that the management overhead was still far too much to make it a viable option as part of our continuous deployment process” says Friedman.
The Benefits of using the Spot Platform with Apache Mesos and Marathon
PubNative has a diverse Apache Mesos environment consisting of Chronos, Elasticsearch, Kibana, Logstash, Spark (for data processing), and Presto workers. Marathon will move these containers around to optimize a cluster of instances. If additional vCPUs are required, Spot will automatically place a bid for a new spot instance from within a pool of different instance types (
m4.large, m4.xlarge, m3.medium for example).
In the most cases, the Spot platform is able to place a bid on a larger instance size for less than what it would cost for a smaller size.
“This is basically a double win since you are paying less money for even greater power than the size that was originally requested.”
This diversification of instance types has added benefits like vCPU and memory weighting. “We use Marathon to orchestrate our containers. If a certain container can benefit from a compute or memory optimized instance type, Marathon will automatically move this container to that instance.” The result is incredible cost and compute optimization with less management overhead.
In addition to Mesos, PubNative has been using the Sidekiq platform to manage job queues. The first layer of this stack is a web layer which receives requests from clients. The second layer is a cluster of scalable worker instances to process the queue. The workers are created as a Elastigroup by Spot to ensure cost optimization and availability.
As an added bonus, PubNative has found additional uses for Spot beyond just cost savings and stability.
“The RI Utilization graph has been a powerful tool to ensure we’re using the RI capacity that we have already invested in. The Spot console allows us to see cost savings across multiple AWS Regions and Accounts in a single concise dashboard. Our finance department especially loves to see our consistent 70% savings over on-demand instances.”
Amir Friedman, Director of Engineering.