There are different tools and mechanisms for autoscaling in Kubernetes at both the application and infrastructure layers to help users manage their cluster resources. In this article, we’ll explore two infrastructure autoscaling tools for Kubernetes — Ocean by Spot and open source Cluster Autoscaler. We’ll provide an overview of:
- Autoscaling in Kubernetes
- What Ocean is and how it works
- What Cluster Autoscaler is
- Cluster Autoscaler considerations
- How Ocean and Cluster Autoscaler compare
On the pod level, the Vertical Pod Autoscaler (VPA) allocates resources by monitoring resource utilization of pods and increasing or decreasing CPU and memory. The Horizontal Pod Autoscaler (HPA) measures metrics on deployments and autoscales by replicating pods across the cluster based on application demands. For effective auto scaling however, you need to also address it at the cluster level to scale infrastructure.
Ocean by Spot is a fully managed Kubernetes data plane service that provides a serverless infrastructure engine for running containers. Leveraging pod-driven auto scaling, Ocean dynamically allocates compute infrastructure based on container requirements. It’s designed to work in such a way that pods and workloads can take advantage of the underlying capabilities of cloud compute infrastructure such as pricing, lifecycle, performance, and availability without having to know anything about it.
Ocean takes responsibility for the following:
Ocean provides users with a number of features that enhance their ability to effectively and efficiently manage their container cluster resources, including the following:
- With out-of-the-box nodes of varying types and sizes, users don’t have to configure or maintain individual scaling groups
- Ocean dynamically scales infrastructure and allocates the best fit of instances based on scale, shape of pods and any labels, taints or tolerations
- Events are monitored at the Kubernetes API server, affording levels of visibility and flexibility that can’t otherwise be achieved, ensuring dependable performance and fast scalability
- Ocean maintains a scoring model for compute capacity markets to significantly reduce interruptions and efficiently leverage cloud pricing models (spot instance, on-demand, and reserved instances) for up to a 90% cost reduction
Cluster Autoscaler is an open-source project that automatically scales a Kubernetes Cluster based on the scheduling status of pods and resource utilization of nodes. It does this by leveraging your cloud provider’s auto scaling capabilities delivered via Node Pools. Cluster Autoscaler matches pods with configured node groups and scales when an appropriate match is found. Within AWS, Cluster Autoscaler leverages Auto Scaling Groups (ASGs) and Spot Fleet to deliver infrastructure, and works with GCP and Azure, amongst others. Cluster Autoscaler is actively maintained by the upstream Kubernetes community and released in line with the corresponding Kubernetes release.
- Mixed instance types can be used in a node group, but instances need to have the same capacity (CPU and memory)
- With no option to fallback to on-demand instances, using spot instances can present performance and availability risks
- Auto Scaling Groups are managed independently by the user
In order to help make an informed decision between Ocean and Cluster Autoscaler, we have cataloged the architectural differences between the two. Given that Node Pools are delivered via cloud provider autoscaling resources, it seems best to compare Cluster Autoscaler when backed by each cloud provider’s native capabilities separately.
Within AWS, Cloud Autoscaler leverages Auto Scaling Groups (ASGs) and Spot Fleet to deliver the infrastructure.
Cluster Autoscaler with AWS Auto Scaling Groups
|Mixed Instance Types||CA will work with multiple node groups 1 of 3 ways—randomly, by most pods or by least waste—based on the ASGs added by the user. Support is limited however:
||Supported. Ocean supports mixed instance types across all families by default. It uses an algorithm to determine the right instance size/type to provision based on the pending unscheduled pods, historical use and cost.|
|Availability Zone awareness||
A single AutoScaling group cannot span multiple availability zones without consideration for rebalancing. Alternatively one can manage one AutoScaling Group per Availability Zone.
|Supported. With Ocean, the k8s cluster can be managed with a single entity that manages all the underlying Instances, across multiple configurations, irrespective of the AZ.|
|Persistent Volume Claims (PVC) awareness||
NodeGroups must be configured with an AutoScaling Group that is tied to a single Availability Zone.
|Supported. Ocean reads requirements of pending pods in real time. If the resource should have a volume available, Ocean will launch the Instance in the required AZ.
No additional management is needed
|Fallback to on-demand||CA or ASG/Spot Fleet don’t have an option to Fallback to an on-demand instance.||
Supported. Ocean falls back to on-demand instances when there is a shortage in spot capacity pools.
Once spot capacity becomes available, Ocean automatically reverts back to spot instances.
|Scale down and re-shuffling pods||Scale down behavior is based on the conditions:
Ocean scale down takes all CA considerations into account and couples that with the Instance size for bin-packing efficiency. This results in ~30% reduction in cluster footprint when compared to CA.
Ocean performs a pod bin packing strategy every 5 minutes to determine if any underutilized nodes can be terminated by shifting running pods onto other available nodes.
|Spot interruption handler||In order to handle spot interruptions, one needs to install aws/spot-interruption-handler daemon-set.||Available by default in Spot SaaS platform. With Spot, one does not need to install extra tools in the cluster. Interruptions are predicted and managed automatically.|
|Fast, high-performance auto scaling to meet immediate application demand||
Cluster Autoscaler supports DIY over-provisioning to deliver workload based headroom.
Cluster Autoscaler recommends running pods with very low priority (see Priority Preemption) that hold resources which can be used by other pods. If there are not enough resources then placeholder pods are preempted and new pods take their place. Eventually, placeholder pods become un-schedulable and force CA to scale up the cluster.
Supported. Ocean automatically calculates a “cluster headroom” parameter – which allows your cluster to always have space for new incoming pods, without waiting for new nodes to show up.
The headroom configuration is available via REST API, Terraform or CloudFormation.
Alternatively, just specify the percentage of cluster resources, or a fixed amount of CPU/Memory the cluster needs to have as a buffer and that’s it.
|Cloud vendor support||AWS, GCP, Azure||AWS, GCP, Azure|
||Must manage Auto Scaling Groups independently and associate them with the cluster using labels.||Ocean automatically scales infrastructure dynamically as needed.|
||Requires the set up and labelling of additional GPU based node-pools to support different types and sizes||Supported out of the box. When pods come in, Ocean launches the most relevant GPU by looking at the pod resource requirements of GPU units, enabled EC2 instance by taking into account their cost and availability.|
|Node template generation
||Upstream autoscaler always uses an existing node as a template for a node group. The cloud provider information (e.g. a launch configuration) is only consulted if the node group is empty. Additionally, only the first node for each node group is selected, which might be up-to-date or not.||Using Ocean ‘Launch Specification’, Ocean has a source of truth for the node template which is predictable. Launch specifications provide a large set of (optional) properties to configure and can be updated at will.|
|Actual available compute resources calculation
The amount of available memory is usually smaller than what AWS API provides. Additionally, resources reserved by the system (via the –system-reserved and kube-reserved) are not accounted for.
Upstream autoscaler does not include those factors when scaling out, nor considering kube-proxy resource requirements properly (assumes static amount of CPU)
|Provides a mechanism to predict the actual available resources with every scale out activity, based on all running nodes in the cluster, ensuring that a loop of scale ups without having the pending workloads scheduled is avoided|