

Scaling Kubernetes nodes is not an easy task. Therefore, we are happy to announce a new service that will remove all of this complexity and frustration, Elastigroup “Kubernetes AutoScaler”.
The Challenge of Scaling Containers on top of Kubernetes
Kubernetes topology is built on a Kubernetes master
that is responsible for maintaining the desired state for your cluster. When it comes to scaling containers, you need to ensure it scales efficiently. Let’s take a look at an example: say your k8s cluster runs 10
nodes of c3.large
(2 vCPUs
and 3.8 GiB of RAM
) and 10
nodes of c4.xlarge
(4 vCPUs
and 7.5 GiB of RAM
), so your total vCPUs
is 60*1024 = 61,440 CPU Units
and the total RAM is 113 GiB
.
However, what happens if a single pod
requires 16 GiB of RAM
? Even though you have plenty of RAM
and CPU
, it still won’t start. To solve that, please allow me to introduce the two most important concept of Containerized applications scaling
- Tetris Scaling
- Headroom
Tetris Scaling
When a pod
is launched in Kubernetes, the scheduler tries to find free capacity for the pod
to run. In some cases, however, there aren’t enough resources to meet the pod’s demands. When a pod
is failed to start, Kubernetes writes events that describe why the pod was unable to run properly. The most common error events are:
No nodes are available that match all of the following predicates:: PodToleratesNodeTaints No nodes are available that match all of the following predicates:: Insufficient cpu, PodToleratesNodeTaints. No nodes are available that match all of the following predicates:: Insufficient memory, PodToleratesNodeTaints
Elastigroup’s Kubernetes Autoscaler automatically detects these events and launches additional instances of the relevant size
and type
. For example, if a pod
requires 10 GiB of RAM
and the Elastigroup configured with 4 different Instances [m3.large, c3.large, c4.large, m3.xlarge]
the Autoscaler will pick up the instance with the most sufficient amount of RAM & CPU that meet with the pod
requirements, in this case, m3.xlarge
.
Headroom
Headroom is a buffer of spare capacity (in terms of both memory and CPU) used to ensure that whenever we want to scale more pods, we don’t have to wait for new instances while also preventing instances from being over-utilized. Each headroom unit consists of two definitions: one for CPU units “cpuPerUnit” (1024 units = 1 vCPU) and one for Memory “memoryPerUnit” (in MiB). In addition, you can also define the number of headroom units you want to reserve in the cluster. For example, let’s say that we define the headroom unit to consist of 512 MiB of Memory and 1024 CPU units, and require a total of 10 units. On top of that, let’s assume the cluster consists of 3 instances. The AutoScaler will verify the total sum of units throughout the entire cluster and check if it meets the required number configured for the group. For example, if the first instance has 2 whole free headroom units (in our example it means at least 2048 MiB and 2048 CPU units), the second instance has 3 whole units and the third has 5 whole units, then the cluster has a total of 10 free headroom units. In this case, no scale-up will be performed. However, if the first instance will have 2 free units, the second instance will have 3 free units but the third will have only 4 free units, then the cluster will have a total of 9 free units. In this case, the group requires 10 free units so this will trigger a scale-up activity.
Get Started
- Go to
Elastigroup-> Edit-> Compute -> Expand the “Integrations”
portion in the Compute tab and enable the Kubernetes integration - Insert the
IP/FQDN
of your Kubernetes Master andtoken
for more information please see: Creating Kubernetes Bearer Token - Insert the scaling properties
Cooldown
– The time (in seconds) after a scaling activity completes before another scaling activity can startEvaluation Period
– The number of consecutive periods that should pass before scaling downHeadroom
– The number of headroom units to keep available at all times on your Kubernetes clusterCPU Headroom
– the number of CPU units reserved for each headroom unitMemory Headroom
– The amount of memory(MB) units reserved for each headroom unit
API & CloudFormation configuration
“group” { ... "thirdPartiesIntegration": { "kubernetes": { "apiServer": "https://example.host.com", "token": "abcd1234", "autoScale": { "isEnabled": true, "cooldown": 180, "down": { "evaluationPeriods": 5 }, "headroom": { "cpuPerUnit": 1024, "memoryPerUnit": 256, "numOfUnits": 10 } } } } ... }
Scale Down
The Autosscaler monitors the cluster and finds idle Instances. An instance is considered idle if it has less than 40%
CPU and Memory utilization. When an instance is found idle for the specified amount of consecutive periods, Elastigroup will first find enough spare capacity in other instances in the cluster. It will then drain the instance pods, reschedule those on other instances, and terminate the idle instance.