Kubernetes Autoscaler

Reading Time: 4 minutes

Scaling Kubernetes nodes is not an easy task. Therefore, we are happy to announce a new service that will remove all of this complexity and frustration, Elastigroup “Kubernetes AutoScaler”.

The Challenge of Scaling Containers on top of Kubernetes

Kubernetes topology is built on a Kubernetes master that is responsible for maintaining the desired state for your cluster. When it comes to scaling containers, you need to ensure it scales efficiently. Let’s take a look at an example: say your k8s cluster runs 10 nodes of c3.large (2 vCPUs and 3.8 GiB of RAM) and 10 nodes of c4.xlarge (4 vCPUs and 7.5 GiB of RAM), so your total vCPUs is 60*1024 = 61,440 CPU Units and the total RAM is 113 GiB.

However, what happens if a single pod requires 16 GiB of RAM? Even though you have plenty of RAM and CPU, it still won’t start. To solve that, please allow me to introduce the two most important concept of Containerized applications scaling

  1. Tetris Scaling
  2. Headroom

Tetris Scaling

When a pod is launched in Kubernetes, the scheduler tries to find free capacity for the pod to run. In some cases, however, there aren’t enough resources to meet the pod’s demands. When a pod is failed to start, Kubernetes writes events that describe why the pod was unable to run properly. The most common error events are:

No nodes are available that match all of the following predicates:: 
PodToleratesNodeTaints

No nodes are available that match all of the following predicates::
Insufficient cpu, PodToleratesNodeTaints.

No nodes are available that match all of the following predicates::
Insufficient memory, PodToleratesNodeTaints

Elastigroup’s Kubernetes Autoscaler automatically detects these events and launches additional instances of the relevant size and type. For example, if a pod requires 10 GiB of RAM and the Elastigroup configured with 4 different Instances  [m3.large, c3.large, c4.large, m3.xlarge] the Autoscaler will pick up the instance with the most sufficient amount of RAM & CPU that meet with the pod requirements, in this case, m3.xlarge.

Headroom

Headroom is a buffer of spare capacity (in terms of both memory and CPU) used to ensure that whenever we want to scale more pods,  we don’t have to wait for new instances while also preventing instances from being over-utilized. Each headroom unit consists of two definitions: one for CPU units “cpuPerUnit” (1024 units = 1 vCPU) and one for Memory “memoryPerUnit” (in MiB). In addition, you can also define the number of headroom units you want to reserve in the cluster. For example, let’s say that we define the headroom unit to consist of 512 MiB of Memory and 1024 CPU units, and require a total of 10 units. On top of that, let’s assume the cluster consists of 3 instances. The AutoScaler will verify the total sum of units throughout the entire cluster and check if it meets the required number configured for the group. For example, if the first instance has 2 whole free headroom units (in our example it means at least 2048 MiB and 2048 CPU units), the second instance has 3 whole units and the third has 5 whole units, then the cluster has a total of 10 free headroom units. In this case, no scale-up will be performed. However, if the first instance will have 2 free units, the second instance will have 3 free units but the third will have only 4 free units, then the cluster will have a total of 9 free units. In this case, the group requires 10 free units so this will trigger a scale-up activity.

Get Started

  1. Go to Elastigroup-> Edit-> Compute -> Expand the “Integrations” portion in the Compute tab and enable the Kubernetes integration
  2. Insert the IP/FQDN of your Kubernetes Master and token for more information please see: Creating Kubernetes Bearer Token
  3. Insert the scaling properties
    • Cooldown – The time (in seconds) after a scaling activity completes before another scaling activity can start
    • Evaluation Period – The number of consecutive periods that should pass before scaling down
    • Headroom – The number of headroom units to keep available at all times on your Kubernetes cluster
    • CPU Headroom – the number of CPU units reserved for each headroom unit
    • Memory Headroom – The amount of memory(MB) units reserved for each headroom unit

API & CloudFormation configuration

“group” {
...
"thirdPartiesIntegration": {
 "kubernetes": {
 "apiServer": "https://example.host.com",
 "token": "abcd1234",
 "autoScale": {
 "isEnabled": true,
 "cooldown": 180,
 "down": {
 "evaluationPeriods": 5
 },
 "headroom": {
 "cpuPerUnit": 1024,
 "memoryPerUnit": 256,
 "numOfUnits": 10
       }
     }
   }
 }
...
}

Scale Down

The Autosscaler monitors the cluster and finds idle Instances. An instance is considered idle if it has less than 40% CPU and Memory utilization. When an instance is found idle for the specified amount of consecutive periods, Elastigroup will first find enough spare capacity in other instances in the cluster. It will then drain the instance pods, reschedule those on other instances, and terminate the idle instance.