The challenge: Reducing cloud compute costs

At first, Haptik mostly ran using AWS on-demand instances, alongside a small set of reserved instances. Their cloud costs quickly rose and reducing costs became a major focus as a way to reduce cost-of-goods-sold (COGS).

When first contemplating the need to reduce cloud computing costs, reserved instances were considered but deemed too inflexible to be rolled out on a large scale. This led Haptik to start experimenting with the AWS Spot Market. This bore initial fruit, as AWS spot instances were available at a cost of 70-80% lower than on-demand instances.

Unfortunately, managing spot instances was very complex, requiring lots of internal focus and maintenance. They built automation scripts to help handle the process, but the risks of running production environment on it were still high. In search of an automatic and reliable solution for managing spot instances, Ranvijay Jamwal, Engineering Manager – Architecture & DevOps at Haptik, found Spot. Today, Spot is a major element of their cost-reduction strategy.

“We are currently saving at least 85% costs on EC2 instances using Elastigroup by Spot,” says Jamwal. “There are many features available now when using Spot platform which we easily use to improve our performance while keeping the costs down, one example is that we are checking for Idle resources and releasing them to save costs.”

So Why Spot?

Spot is an online cloud management platform that allows companies to run their mission-critical applications on the excess capacity of cloud providers, saving up to 85% on costs. Spot supports AWS, Azure, Google Cloud & Packet.

The main Spot product utilized by Haptik is Elastigroup by Spot – a software layer on top of the Cloud Infrastructure that functions as a cost-oriented Auto-Scaling Group. Elastigroup first uses predictive algorithms to predict spot behavior, capacity trends, pricing, and interruption rate. Whenever there’s a risk of interruption, Elastigroup acts accordingly to balance capacity, ensuring 100% availability and no risk of downtime. This means that your application will always run on the most cost-efficient collection of instances – the best-priced spot instances when available and falling back to on-demand when not, in addition to prioritizing any reserved instances you may already own.

As Haptik found, the main drawback of using spot instances themselves (and not using Elastigroup by Spot) is the massive time suck and focus spent bidding on spot instance and managing the server count 24*7 for your critical operations.

Top 5 features that Haptik loves about Elastigroup

  • Cluster Orientation – Elastigroup’s sophisticated algorithm takes care of purchasing the right instances. You can just set the algorithm to be sensitive for one of the following:
    • Cost Optimization – The Elastigroup will look for and utilize the cheapest instances available.
    • Availability Based – The Elastigroup will look for and utilize the spot instances with the highest availability
    • Balanced Based – A comfortable mix of the previous two options.
  • Fallback to on-demand – Whenever Spot Instances that fit your needs aren’t available, Spot falls back to on-demand, ensuring no risk of downtime. Plus, Spot will then revert back to spot when available.

    • Scheduling – Another way to save money, you can set various triggers (e.g. capacity change) to spin up or down instances based on specific time frames.

     

    • Running Stateful apps on spot – Spot lets you run any application without a single point of failure on spot instances, and Stateful apps are no exception. Elastigroup takes continuous snapshots of your instances with the relevant data attached. Once it predicts an interruption, it will launch a new instance with the same data attached from the latest snapshot. You can also “Maintain Private IP” of the server for application who relies on the Private IP of the Instance. For More details click here.

     

    • Scaling Policies –  Just like an Auto-scaling Group, you can automatically scale up or down your Instances as needed. You can determine your scaling strategy based on anything from AWS ClowdWatch metrics to Spot Spectrum metrics.

Ranvijay Jamwal, Engineering Manager – Architecture and DevOps At Haptik “Working with Spot is helping us to reduce 80% of our cloud computing costs on a monthly basis with no risk of downtime, which is crucial to our business. We’ve improved our performance with no additional IT resources and now we can invest more time and effort in our clients.”

Note: This was a guest post written by Ranvijay Jamwal of Haptik

Haptik is one of the world’s largest chatbot platforms, building applications for consumers, publishers, and enterprises. The company is at the forefront of the paradigm shift from apps to bots, building bots for an array of use cases- from e-commerce to customer service and utility to lead generation.

Haptik was born out of a personal need. The founders realized they were using text communication more than any other application on their phones. What’s now obvious to the rest of us became obvious to them – chat is doing to mobile what search has done to the internet.

https://haptik.ai/