Challenge: Keeping Cloud Costs Low While Managing Disruptions in a Microservices Framework

Carbon processes over 1 billion events every day (including intent signals, context, brand affinity, browsing behavior & demographics) and by integrating with other parts of the marketing technology stack such as ad-servers, demand side platforms (DSPs) and content management platforms, Carbon turns those events into insights and actionable data points.

In order to manage and support the growing requirements of their platform,  Carbon’s engineering group came to realize that their current cloud architecture is gradually becoming insufficient, therefore, they decided to transition Carbon’s back-end infrastructure to a microservices container architecture. 

 

Choosing AKS for Managed Kubernetes Services

The Carbon team settled on Kubernetes as their container orchestration platform, but sought an even better way to manage the building process of Kubernetes clusters in the cloud. The solution needed to quickly scale during unpredictable, peak traffic hours, and support a complex work queue to keep their application environment clean post-execution.

Carbon was looking for a way to deliver new services faster, in order to provide more value for its customers. The solution required more agility, elasticity and the ability to quickly and dynamically scale up and down while maintaining the lowest costs possible.

The Carbon team ultimately deployed their Kubernetes clusters on top of AKS (Azure Kubernetes Service), Microsoft’s managed Kubernetes service, as the control plane for their entire K8s infrastructure. Carbon has been working with Microsoft for many years, so choosing AKS as our Kubernetes managed service was a natural choice for us,” noted Alistair McLean, CTO at Carbon.

After completing the migration to Kubernetes, Carbon realized that the last missing piece in their cloud architecture was the ability to provision the underlying infrastructure seamlessly and in the most cost-efficient way.  Due to the nature of their business, Carbon required a solution that would support a fast yet simple infrastructure auto-scaling, when the application reaches unpredictable peak traffic from their global customers throughout the business day. 

 

Azure Low-Priority VMs: DIY Causing Application Downtime

Carbon’s engineering team were hoping to leverage Azure’s Low-priority VMs as the underlying infrastructure nodes to host the Kubernetes clusters, as they were motivated to dramatically lower their cloud operational costs.

The main challenge with their homegrown solution for Low-Priority VMs was that detached machines were not drained properly when removed from the Kubernetes cluster, which resulted in occasional application downtime, due to synchronization issues that arose from pods not being scheduled correctly. 

 

Solution and Benefits: Automating AKS, Increasing Application Availability, All While Reducing Cost by 80%

To tackle their technological challenges with running AKS clusters on low priority VM’s, Carbon decided to approach Spot to help with managing and orchestrating the underlying infrastructure of their Kubernetes clusters. Spot’s solution automates the entire process that drains and cleans up the VM when it is removed from the K8s cluster due to a short lifecycle replacement of the Low-priority VMs.

Spot cordons the containers, migrates and restarts them on different hosts. This eliminates the concern of the VM shutting down incorrectly and contributes to the overall cluster synchronization. This helps ensure that pods are scheduled properly, thus preventing application downtime.

Thanks to the seamless integration between Spot and AKS, Carbon was able to not only enjoy average discounts of 80% on their cloud-compute costs, but also a fully stable Kubernetes cluster. Whenever Low-Priority VMs are interrupted, the affected machines are properly drained and detached from the cluster. In cases that the Low-Priority VM market is unstable or unavailable for a particular VM type, Spot guarantees availability by automatically falling back to On-Demand.  Alistair enthused, “It’s a turn-key product, we’ve allocated the nodes via Spot’s console, and since the initial configuration we didn’t really have to touch anything.”

AKS or other workloads on Low-Priority VMs

Fast and Efficient Autoscaling is Key to System Stability

In addition to that, Spot’s advanced Autoscaling technology, was responsible for launching new VMs when pods were in “pending-schedule” state, as well as keeping the cluster fully utilized by scaling down VMs with low container utilization and intelligently bin-packing containers over time. Spot continuously monitors pod metrics and scheduling needs (like labels, taints, tolerations, storage & network requirements) and scales the infrastructure accordingly, while making the most efficient use possible of the VMs. In addition, it’s also tightly integrated with Horizontal and Vertical Pod Autoscaling (HPA & VPA). The fact that Spot’s autoscaling can scale pods immediately, contributed to  Carbon application uptime and overall stability of the system. “Working with Spot and indirectly with the Azure team has gained us a tighter integration between Kubernetes and AKS, all while running on Low-priority VMs,” added Alistair.

 

Visibility Into Azure Cloud Costs and AKS Cluster Activity

Furthermore, Spot provided the  Carbon team with deeper visibility into what is going on in their Kubernetes clusters, in terms of:

  • CPUMemory Utilization of Pods, VM’s, overall cluster health
  • Cost breakdown 
  • Management and monitoring 
  • Pod distribution across nodes 

 

Closing Comments from Carbon’s CTO, Alistair McLean

“We knew that choosing Spot for managing the underlying infrastructure of our Kubernetes clusters was an easy choice. During the POC we immediately witnessed a significant decrease in our cloud-compute spendings as well as higher efficiency of our Kubernetes clusters. In addition to that, empowering the Spot Controller with the minimum configuration from our side allowed us to enjoy seamless and smooth scaling activities, with deeper visibility into what’s going on inside the cluster in real-time. On top of all that, I must recognize Spot’s support team, which is available for us 24/7 for any issue. They are prompt, responsive, highly professional and effective, which makes you feel safe as you venture into this world.”

Carbon is a next-generation Data Management Platform (DMP), offering any business with an online audience, data-driven tools and solutions to help grow customers and revenue. Founded in early 2018, Carbon operates in both the US and the UK from its headquarters in the North East of England.

Carbon is now part of Magnite.

https://www.magnite.com/