Kubernetes lets you automate many management tasks, including provisioning and scaling. Instead of manually allocating resources, you can create automated processes that save time, let you respond quickly to peaks in demand, and conserve costs by scaling down when resources are not needed. It can be used alongside the cluster autoscaler by allocating only the resources that are needed.
The Kubernetes autoscaling mechanism uses two layers:
This is part of our series of articles about Kubernetes.
In this article, you will learn:
When the level of application usage changes, you need a way to add or remove pod replicas. Once configured, the Horizontal Pod Autoscaler manages workload scaling automatically.
HPA can be useful both for stateless applications and stateful workloads. HPA is managed by the Kubernetes controller manager, and runs as a control loop. The controller manager provides a flag that specifies the duration of the HPA loop, which is 15 seconds by default. The flag is: --horizontal-pod-autoscaler-sync-period
After each loop period, the controller manager compares actual resource utilization to the metrics defined for each HPA. It obtains these from either the custom metrics API or, if you specify that auto scaling should be based on resources per pod (such as CPU utilization), from the resource metrics API.
HPA makes use of metrics to determine auto scaling, as follows:
Avoid using HPA alongside vertical pod autoscaling (VPA) on memory or CPU when evaluating CPU or memory metrics. Additionally, when using a Deployment, you cannot configure HPA on a ReplicaSet or Replication Controller, only on the Deployment itself.
Here are two key best practices for efficiently using HPA:
The Vertical Pod Autoscaler uses live data to set limits on container resources.
Most containers adhere more closely to their initial requests rather than to upper limit requests. As a result, Kubernetes’ default scheduler overcommits a node’s memory and CPU reservations. To deal with this, the VPA increases and decreases the requests made by pod containers to ensure actual usage is in line with available memory and CPU resources.
Some workloads can require short periods of high utilization. Increasing request limits by default would entail wasting unused resources, and would limit the nodes that can run those workloads. HPA may help with this in some cases, but in other cases, the application may not easily support distribution of load across multiple instances.
A VPA deployment calculates target values by monitoring resource utilization, using its recommender component. Its updater component evicts pods that must be updated with new resource limits. Finally, the VPA admission controller overwrites the pod resource requests when they are created, using a mutating admission webhook.
Updating running pods is still experimental in VPA, and performance in large clusters remains untested. VPA reacts to most out-of-memory events, but not all, and the behavior of multiple VPA resources that match the same pod remains undefined. Finally, VPA recreates pods when updating pod resources, possibly on a different node. As a result, all running containers restart.
Here are two best practices for making efficient use of Vertical Pod Autoscaler:
The cluster autoscaler changes the number of cluster nodes, while HPA scales the number of running cluster pods. Cluster autoscaler seeks unschedulable pods and tries to consolidate pods that are currently deployed on only a few nodes. It loops through these two tasks constantly.
Unschedulable pods are a result of inadequate memory or CPU resources, or inability to match an existing node due to the pod’s taint tolerations (rules preventing a pod from scheduling on a specific node), affinity rules (rules encouraging a pod to schedule on a specific node), or nodeSelector labels. If a cluster contains unschedulable pods, the autoscaler checks managed node pools to see if adding a node may unblock the pod. If so, and the node pool can be enlarged, it adds a node to the pool.
The autoscaler also scans a managed pool’s nodes for potential rescheduling of pods on other available cluster nodes. If it finds any, it evicts these and removes the node. When moving pods, the autoscaler takes pod priority and PodDisruptionBudgets into consideration.
When scaling down, the cluster autoscaler allows a 10-minute graceful termination duration before forcing a node termination. This allows time for rescheduling the node’s pods to another node.
Cluster autoscaler only supports certain managed Kubernetes platforms—if your platform is not supported, consider installing it yourself. Cluster autoscaler does not support local PersistentVolumes. You cannot scale up a size 0 node group for pods requiring ephemeral-storage when using local SSDs.
Here are two best practices for making efficient use of Cluster Autoscaler:
There are other methods you can use to scale workloads in Kubernetes. Here are two common methods:
For users that don’t want to take this DIY approach to scaling cluster infrastructure, Spot by NetApp’s serverless container engine, Ocean, does all the work for you. Spot Ocean provides autoscaling that reads requirements of pending pods in real-time. This pod-driven autoscaling serves three goals:
With Cluster Autoscaler, users need to have a good understanding of their container needs. Spot Ocean, however, monitors events at the Kubernetes API server and dynamically allocates infrastructure based on container requirements (CPU, memory, networking) while honoring specific labels, taints, and tolerations. An out-of-the-box solution, Spot Ocean users don’t have to configure or maintain individual scaling groups, nor are they restricted from using mixed instance types or multiple AZ’s in a scaling group.
Understanding Kubernetes Cluster Autoscaler: Features, Limitations and Alternatives
There are different tools and mechanisms for scaling applications and provisioning resources in Kubernetes. Kubernetes’s native horizontal and vertical pod autoscaling (HPA and VPA) handle scaling at the application level. However when it comes to the infrastructure layer, Kubernetes doesn’t carry out infrastructure scaling itself. Understand Kubernetes infrastructure autoscaling tools, including Kubernetes Cluster Autoscaler, to get started with scaling your container clusters.
Read more: Understanding Kubernetes Cluster Autoscaler: Features, Limitations and Alternatives
Kubernetes Deployment: The Basics and 4 Useful Deployment Strategies
Kubernetes provides capabilities that help you efficiently orchestrate containerized applications. This is mainly achieved through the automation of provisioning processes. A Kubernetes Deployment enables you to automate the behavior of pods. Instead of manually maintaining the application lifecycle, you can use Deployments to define how behavior is automated. Learn how a Kubernetes Deployment works, considerations for using it, and four useful strategies including blue-green and canary deployment.
Read more: Kubernetes Deployment: The Basics and 4 Useful Deployment Strategies
Kubernetes StatefulSets: Scaling and Managing Persistent Applications
A Kubernetes StatefulSet is a workload API resource object. There are several built-in Kubernetes workload resources, each designed for certain purposes. StatefulSets are designed to help you efficiently manage stateful applications. Learn how Kubernetes StatefulSets can help you define, scale, and manage persistent applications on Kubernetes.
Read more: Kubernetes StatefulSets: Scaling and Managing Persistent Applications
Kubernetes ReplicaSet: Kubernetes Scalability Explained
A ReplicaSet (RS) is a Kubernetes object that ensures there is always a stable set of running pods for a specific workload. The ReplicaSet configuration defines a number of identical pods required, and if a pod is evicted or fails, creates more pods to compensate for the loss. Learn how Kubernetes ReplicaSets work, discover 3 types of Kubernetes replication, and see a quick tutorial on creating a deployment with a ReplicaSet.
Read more: Kubernetes ReplicaSet: Kubernetes Scalability Explained
Kubernetes Daemonset: A Practical Guide
DaemonSet is a Kubernetes feature that lets you run a Kubernetes pod on all cluster nodes that meet certain criteria. Every time a new node is added to a cluster, the pod is added to it, and when a node is removed from the cluster, the pod is removed. When a DaemonSet is deleted, Kubernetes removes all the pods created by it. Learn how DaemonSets work, how to perform common operations like creating and scheduling a DaemonSet, and the difference between StatefulSets and DaemonSets.
Read more: Kubernetes Daemonset: A Practical Guide
6 Kubernetes Deployment Strategies: Roll Out Like the Pros
A Kubernetes Deployment allows you to declaratively create pods and ReplicaSets. You can define a desired state, and a Deployment Controller continuously monitors the current state of the relevant resources, and deploys pods to match the desired state. It plays a central role in Kubernetes autoscaling. Understand how Kubernetes Deployment strategies, and discover advanced strategies like ramped slow rollout, controlled rollout, blue/green and canary deployments.
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of kubernetes.
Authored by NetApp
Authored by NetApp
Authored by Komodor
Complete access
for up to 20 instances