Managing and optimizing Kubernetes clusters in an Elastic infrastructure is a continuous effort for engineering and DevOps teams. While Kubernetes is a container-centric orchestration service that simplifies the deployment of containerized applications, managing its underlying infrastructure at scale remains an overhead.
Not too long ago we announced Ocean, our serverless compute engine that abstracts the containers from VMs by dynamically selecting compute instances that fit the Pods’ and Containers’ resource requirements. Building on our vision for Ocean, we saw an opportunity to enhance our end-users’ ability to define the correct pod resource requirements with even greater accuracy, regardless of typical obstacles such as Over or Under provisioning.
In this blog, we will introduce our latest development, Ocean’s VPA (Vertical Pod Auto Scaling). This capability will assist our customers in optimizing their resource requirements in terms of CPU and Memory (based on actual pod consumption) in order to increase the cluster’s utilization and to dramatically lower their operational cloud costs.
Challenges in estimating a Pod resource request
Developers configuring Pod resource requests either guess, provide an estimate by trial and error or run simulations with a test deployment.
However, the time and effort that development teams need to invest to provide accurate metrics of their application’s CPU and Memory consumption, can be quite extensive and often ineffective.
Even after developers dedicate time and resources to establish an accurate measurement, test simulation metrics will almost always differ from actual production usage.
Moreover, production resource consumption invariably changes over time exacerbating deviations from the initial estimate.
These factors lead us at Spot.io to develop a system that collects the actual CPU and Memory usage to define the most accurate resource requests for Kubernetes Pods.
Why accurately defining Pod Resource requirements matters
Kubernetes provides you with the option to define resource guidelines for your containers. While this is not a mandatory setting, it is highly recommended to define resource requests and resource limits, based on specific CPU and Memory needs, in order to avoid Over or Under provisioning of your Kubernetes pods. Incorrect provisioning can lead to
- Over-Provisioning – a pod requests additional resources than what it actually requires in order to operate, leaving the cluster underutilized, with idle resources, which eventually increases operational costs.
- Under-Provisioning – a pod consumes more resources than what it initially requested. The cluster is over-committing, leading to performance issues, and in extreme conditions, Kubernetes may terminate your pods.
What is a Pod resource request
A resource request is the amount of CPU and Memory that the Kubernetes scheduler will reserve for the Pod’s operation. A resource limit is the maximum amount of CPU and Memory that the system will allow the pod to consume. Once the pod will attempt to consume more resources than its defined limit, it will be restricted.
Once a pod is restricted, the Kubernetes scheduler may restrict additional CPU and memory resources to be assigned to that pod.
In other events, such as OOM (Out of memory), where the Pod doesn’t have enough memory to run, the Kubernetes scheduler even terminates it.
The Kubernetes scheduler will use the defined resource request in order to decide on which node to place the Pod and will validate that the container has indeed enough resources in order to launch. In cases which the scheduler’s resource check fails, i.e. there is no applicable node in the cluster that can run the Pod, it enters into an “unscheduled” state, and this is where Ocean kicks in and spins up a new node which should perfectly fit, based on the Pod’s resource requests and other parameters such as labels, node selectors and more.
VPA Process Flow
Metric Server is required
In order to collect the metrics required for Ocean VPA, it is necessary to install a metric server which will operate as part of your cluster.
Using the metric server, Spotinst Ocean will collect the usage metrics for all deployments in the cluster periodically, once every 5 minutes. Spotinst Ocean requires at least 4 days of data in order to present actual usage.
For each resource (CPU and Memory), Spotinst Ocean will calculate the average amount of resource utilization in the past 2 weeks as a benchmark for actual consumption.
Resource resizing recommendations will be triggered for one of the following scenarios:
- The requested resources are above/below the Avg. metric in all timeframes by more than 15%
- The requested resources are continuously above/below the Avg. metric in the last month by more than 30%
Resource recommendations will be provided as follows:
1) Ocean UI:
2) Bi-Weekly summary email (configurable)
Running a sample application
For the purpose of this test, we have set up an EKS cluster integrated with Spotinst Ocean.
Let’s create a deployment with this stress.yml:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: api-service namespace: default labels: Kubernetes-app: stress spec: replicas: 3 selector: matchLabels: Kubernetes-app: stress template: metadata: labels: Kubernetes-app: stress spec: containers: - name: stress image: progrium/stress imagePullPolicy: Always args: ["--cpu", "1"] resources: limits: cpu: "5000m" memory: "3Gi" requests: cpu: "2000m" memory: "512Mi"
Next, apply the stress deployment to the cluster:
Kubectl apply -f api-service
- The pods were configured to consume 1vCPU each, for validation:
kubectl top pod --all-namespaces
- We can also validate the CPU in the Ocean UI – the average CPU utilization for the ‘api-service’ deployment is 1vCPU, while the CPU Request is 2 vCPU
Acknowledging the suggestion and applying it with ‘kubectl’
Ocean’s VPA suggestions will begin providing recommendations after collecting cluster usage metrics for a period of 1 week.
Since the ‘api-service’ deployment is consuming fewer resources than requested, we received a resize recommendation. The recommended value is the average of the actual pods’ resource utilization, over the past two weeks.
- Use kubectl to apply changes –
kubectl set resources api-service --requests=cpu=1000
Review the rollout progress from the command line:
You can notice how Ocean identified that the node resource allocation decreased, and will scale down as needed.
In the Ocean logs, we received full visibility to the simulation that the Auto Scaling activity performed, in order to binpack the running pods into fewer nodes:
After applying the recommended metrics suggestion, the cluster is now more efficient and utilized, and all of our Pods are running. However, they require fewer resources, and therefore reduced the infrastructure costs of our cluster without having to go through all the simulations as we described earlier.
In this exercise, we created a deployment with a generous CPU Request configured, yielding idle capacity. After reducing the CPU resource requests from 2 vCPU to 1 vCPU, Ocean terminated an instance after validating that all the Pods in the cluster can remain operational after this scale down activity.
We are excited to introduce Ocean’s VPA development, another incredible addition to Ocean’s infrastructure management. capabilities.
With the visibility that Ocean VPA provides, you can now enjoy a more efficient, highly performant and cost-effective Kubernetes cluster, identifying the exact amount of CPU and Memory for every Pod and Deployment shouldn’t be a guess anymore.
VPA is now integrated as part of Spotinst Ocean UI Console and API in all regions.
Get started today and use the chatbot if you want to engage with one of our colleagues for more information!
We will soon announce our own custom HPA (Horizontal Pod Auto Scaling), which will provide additional scaling elasticity to your Ocean cluster, Stay Tuned!