Eliminate scaling lag without overprovisioning Kubernetes

Jeffrey Walsh

Global Alliance Manager, AWS

November 2, 2023

4 min read

Amazon Web Services Containers EKS Kubernetes Ocean

Eliminate scaling lag without overprovisioning Kubernetes

Let’s get right to the point: most Kubernetes workloads are underutilizing CPU and memory, with 49% of containers using less CPU than their defined requests.

Although Kubernetes provides users with the ability to define resource guidelines for containers based on CPU and memory needs, it can be difficult to define and maintain these resource requirements for dynamic applications, especially in fast scaling scenarios.

This makes the risk of overwork and burnout real, even for the most sophisticated teams. Many application developers become overwhelmed with the need to have sufficient Kubernetes expertise along with managing their daily tasks. In fact, research suggests that developers in organizations that use Kubernetes spend an average of 4.5 hours writing application code and 16.5 hours on other tasks like maintaining internal tooling and debugging pipelines.

But without expertise and time, Kubernetes environments can quickly become cloud cost centers, running inefficiently with higher risk of downtime.

Did you just blackout?

Some developers attempt to right size applications by configuring resource requests based on trial and error or by simulating a test deployment. Many simply default to significant overprovisioning to ensure all bases are covered. Incorrect provisioning, however, can lead to idle resources and higher operational costs or result in performance issues if the cluster doesn’t have enough capacity to run on.

Another approach is to run Kubernetes on spot instances for certain workloads because of the enormous cost-savings benefits and because spot instances can help make infrastructure run more efficiently. But spot market capacity, availability, and churn make this tricky. In the case that request constraints can’t be met or there isn’t enough capacity, scale up can be paused or delayed, causing a blackout.

Eliminating scaling lag and hassle

AWS recommends over-provisioning worker nodes using dummy pods to address scaling lag using Elastic Kubernetes Service (EKS).

The dummy pods contain a pause container that is scheduled by the kube-scheduler according to pod specifications’ placements and CPU/memory. The pause container then waits for the termination signals, which come if Kubernetes needs to pre-empt its capacity for a higher priority workload. When a real workload’s pods are created, kube-scheduler evicts the dummy pods from the worker nodes and then schedules the real workload’s pods on these nodes.

The number of dummy pods to be over-provisioned is based on the trade-offs of performance required for scaling of the worker nodes and the cost of running the dummy pods.

AWS recommends simplifying overprovisioning by creating the deployment of a pod that has the pause container and setting the replica count to a static value. The downside is as a cluster grows (e.g., to hundreds, or thousands of nodes), having a static replica count for over-provisioning may not be effective.

How Spot Ocean helps

Does all of this sound complicated? Well, it is.

That’s why Spot Ocean uses automation to simplify cloud infrastructure scaling using machine learning (ML) algorithms. By continuously analyzing how containers are using infrastructure, Ocean automatically scales compute resources to maximize utilization and availability with the optimal blend of spot, reserved, and on-demand instances. This is even true for mission critical and production workloads. Spot’s predictive algorithms analyze the spot instance markets and only select instances that will not risk the performance or availability of the application. These resources are provisioned on the workload’s unique performance metrics, helping to improve DevOps productivity, workload reliability, and cost efficiency.

In fact, Ocean achieves 100% utilization across containerized workloads using ML-based bin packing. It does this by constantly simulating the Kubernetes scheduler actions and working to satisfy Kubernetes resource needs while ensuring there is appropriate headroom configured on the workload to prevent resource exhaustion.

Ocean considers the following Kubernetes configurations:

Resource requests (CPU, memory, and GPU)
nodeSelectors
Required affinity and anti-affinity rules
Taints and tolerations
Well-known labels, annotations, and taints
Spot proprietary labels and taints
The Cluster-autoscaler.kubernetes.io/safe-to-evict: false label
Pod disruption budgets
Persistent Volumes and Persistent Volume Claims
Pod topology spread constraints

This ensures graceful scale-up and scale-down scenarios, including container instance replacement when needed.

The result is the highest infrastructure availability, optimized for your unique workloads, at the lowest possible cost per compute unit.

A must-have in today’s market

Business leaders are under immense pressure to deliver results in the face of market challenges. Automation is an immediate way to improve efficiency and productivity while lowering cloud unit metrics costs. According to the Harvard Business Review, the technology is no longer a “nice-to-have”; it’s a must-have, because automation tools improve both business and employee performance. Automation also helps combat burnout and improve work-life balance, two critical retention strategies for companies in today’s shifting labor market.

Learn how Spot can help you improve your cloud infrastructure management for AWS and request a demo today.