A brief introduction to Haptik
One of the largest names in the conversational AI industry, Haptik has been at the forefront of a paradigm shift in using AI to disrupt how marketing works. With a team of some of the finest minds in AI, Haptik enables brands to improve engagement with their customers through support and lead generation. Lead generation – one of the most significant aspects of marketing – is essentially the conversion of a person merely chatting with a chatbot to being interesting in buying a product or service. Haptik functions across different channels, whether it’s WhatsApp, web chat, or a mobile chat application.
The highlight
One of the recent highlights for the platform was during the pandemic when they created a COVID-19 bot – one of the largest WhatsApp bots in the world – for the Government of India. This demonstrates how Haptik functions on a global scale by partnering with different customers across the globe.
The Background
When Haptik started out in 2013, its product was a single monolith running a machine learning pipeline, very similar to a web server serving requests. They realized pretty soon that global scaling required migration to the cloud and started the process of migrating from on-prem to AWS. At this point, they were still a B2C company doing things manually.
On the business side, they were still handling chats manually where the chatbots had agents sitting behind a system replying to the end user. On the infrastructure side, they were launching their servers manually by creating a new server and adding it under a load balancer. So, when the cloud journey started, they were manually launching a new load balancer, creating an AMI, creating a new server, putting it under the load balancer, and dealing with many such manual tasks.
The Challenges
Migrating to the cloud brought with it quite a few challenges.
1. Automating everything
Since Haptik was still doing everything manually, anyone joining the company at that point had to go through a learning curve where they had to follow multiple manual steps just to get basic things right. Following SOPs to get even the basic things right and training every new hire was a significant effort that needed to be simplified. Creating and configuring a new instance was a long-drawn process for a new team member. The Haptik team had to figure out how to automate mundane and tedious tasks without the huge overhead.
2. Cost management
Haptik was essentially working with on-demand instead of using reserved instances, or other cost-effective alternatives. This was proving to be a significantly costly affair. So, reducing the costs of their system was a definite priority.
The Solution
Haptik chose Spot to help overcome these challenges, and chart out a path for more scalable growth in the cloud.
Elastigroup helps autoscale
Elastigroup was the entry-point for Haptik to get started with Spot. They started off integrating two to three Elastigroups in their staging environments and seamlessly migrated from staging to production.
Scaling costed less
Spot enabled Haptik to leverage preemptible VMs (aka spot instances) to save up to 90% on costs as compared to on-demand instances. Spot also made it easy to add a multitude of instances to the load balancer without costing an arm and a leg since it allowed you to get rid of unused instances after a couple of hours. All this led to Haptik’s cloud bill becoming significantly smaller.
Apart from using spot instances, which can be risky for critical workloads, Haptik also wanted the ability to choose which type of VMs to use – preemptible, on-demand, reserved, etc. A few months down the line there was a feature update from Spot that allowed Haptik to choose various specific types of VMs, hence giving Haptik not just cost savings, but more control over the type of infrastructure they run.
Scaling got smarter
Elastigroup enabled auto-scaling based on certain metrics like CPU, memory, and network requests and even allowed for custom metrics. This meant that scaling VMs was not needed for every request and you could choose to scale the VMs or Spot Elastigroups only when a particular metric wasn’t working at all or wasn’t working as it should be. Elastigroup allowed Haptik to define specific target policies and scale clusters based on the conditions defined by these policies. This greatly simplified the process of scaling and shifted the responsibility from the team to a purpose-built solution – Elastigroup.
This, coupled with Elastigroup’s built-in scheduling feature Spot has to offer allowed Haptik to scale up during the day when they experience peak traffic, and to scale down at night when traffic is low. Additionally, Elastigroup could automatically register newly created spot instances and gracefully deregister old ones saving the operations team even more manual work when scaling. All this meant that scaling was not only cheaper and more controlled, but also more intelligent.
Scaling got bigger
As Haptik started to grow, they were seeing peak requests up to 180,000 concurrent users. This meant that they had to create 150-200 nodes in a matter of 2 minutes to support the load. Adding this many nodes manually would have been painful. This is where Spot stepped in and made it effortless to add 200 preemptible VMs in under 2 minutes. The team at Haptik was impressed and even surprised by not just the speed of adding these instances, but how cost-effective they were.
Over the past three years, Haptik went from being from a B2C to a B2B setup while migrating to AWS. They went from running 20-30 VMs every now to now having a lot more VMs running by default, and scaling to hundreds of VMs in minutes.
The Results
Using Spot enabled Haptik to transition to AWS ECS (Elastic Container Service), while still mostly using the same Elastigroups with very minimal changes.
Simplified continuous deployment
One of the key benefits of using Spot was seen in deployments, as Spot simplified the entire process of continuous deployment. It was as simple as giving Spot an AMI ID and clicking on the deploy button. The deployment would act like a Canary deployment where you had the option to define how many of the instances you wanted to deploy – 10%, 20%, 30%, etc. The rollout feature of Spot has tremendously helped Haptik streamline its continuous deployment.
Additionally, Elastigroup’s ability to integrate with any tool being used in Haptik’s CI/CD pipeline made integration and configuration seamless. This is important in a world where the CI/CD pipeline is becoming more crowded with best-of-breed tools that all need to be well-integrated with each other.
Overall resiliency
Currently, at least 90-91% of Haptik’s workloads are totally running on preemptible VMs on AWS. Normally, this would be a huge risk as AWS can delete preemptible VMs with short notice. However, Elastigroup is able to predict when this would happen ahead of time, and migrate all data to new preemptible instances, thus ensuring that workloads face zero disruption and sufficient IPs within all selected availability zones. Using Elastigroup has helped Haptik build more resilient and reliable systems in their pre-production and production environments.
Note: Onnivation, partner and value-added distributor of Spot by Netapp in Asia, has been instrumental in understanding the customer’s requirements and driving the engagement end-to-end to optimise the customers’ compute costs.
What does the future hold?
Today, the biggest driving force for Haptik currently is to become a Kubernetes-first company. And one of the key ways to achieve this is to reduce their reliance on ECS and EKS. Haptik has adopted Azure AKS and now runs a multicloud setup. Their goal is to have the flexibility to operate on any underlying cloud platform while still having the consistency that Kubernetes brings. They are happy that Spot helps here too as it supports both AWS and Azure, enabling Haptik to be cloud-agnostic.
One of the largest names in the conversational AI industry, Haptik has been at the forefront of a paradigm shift in using AI to disrupt how marketing works. With a team of some of the finest minds in AI, Haptik enables brands to improve engagement with their customers through support and lead generation. Lead generation – one of the most significant aspects of marketing – is essentially the conversion of a person merely chatting with a chatbot to being interesting in buying a product or service. Haptik functions across different channels, whether it’s WhatsApp, web chat, or a mobile chat application.