Kubernetes resource limits: A practical guide

What are Kubernetes resource limits?

Kubernetes resource limits define the maximum amount of compute resources a container can use in a cluster. This includes limiting CPU and memory to ensure stability and performance. Without these limits, applications may consume all available resources, impacting other services.

By setting limits, administrators can create a more predictable environment, ensuring system reliability. Resource limits work alongside requests to manage resources. While a limit defines the ceiling of resource usage, it ensures that applications don’t monopolize resources beyond the specified cap. This mechanism serves as a control to prevent resource-intensive applications from affecting the overall system performance.

This is part of a series of articles about Kubernetes architecture

In this article:

Understanding resource requests and limits
Resource types in Kubernetes
Code example: Setting resource requests and limits
Best practices for resource management in Kubernetes

Understanding resource requests and limits

Resource limits are often combined with resource requests, which are a related but distinct concept.

The difference between requests and limits

Requests and limits serve different purposes in Kubernetes. A resource request is the amount of resources required for a container to function. Kubernetes scheduler uses these requests to allocate resources and schedule pods on nodes that can fulfill these demands, ensuring availability of resources.

Resource limits define the upper threshold of resources a container is allowed to consume. If a container exceeds this set limit, Kubernetes may throttle resource usage or even terminate the container. This distinction provides a safeguard against resource hogging, protecting system integrity.

How requests affect scheduling

Kubernetes uses resource requests to make informed scheduling decisions. When a pod is created, the scheduler evaluates available nodes based on the requested resources. It selects nodes that can accommodate these demands while balancing the load across the cluster.

If a node cannot meet a pod’s request, the scheduler will place it elsewhere, or delay deployment if no suitable node is found. This approach ensures that critical components receive the necessary resources up front.

How limits affect runtime behavior

Resource limits have a direct effect during runtime. By setting limits, administrators can prevent a container from consuming more resources than desired. Exceeding a limit might result in Kubernetes throttling the container’s resource usage, which can lead to degraded application performance.

In cases where the resource threshold is reached persistently, Kubernetes may terminate the container to free up resources. This ensures that excessive resource consumption by one component does not impact the availability and performance of other applications in the system, maintaining cluster stability.

Related content: Read our guide to Kubernetes monitoring

Resource types in Kubernetes

There are several types of Kubernetes resources that may require limits.

CPU resources

In Kubernetes, CPU is a primary resource type that dictates how much processing power a container can utilize. CPU resources are measured in CPU units, where one unit equals one physical core available to the system. Requests guarantee a certain CPU level, while limits restrict the maximum CPU time a pod can utilize.

Containers exceeding their CPU limit face throttling, slowing down execution. Proper configuration of CPU requests and limits ensures applications perform without over-allocating resources. This prevents individual containers from causing a performance bottleneck.

Memory resources

Memory resources in Kubernetes refer to the RAM available to containers. Unlike CPU, memory resources are not compressible—if a container’s memory usage exceeds its limit, it may be terminated. Management of memory resources is crucial to prevent out-of-memory errors.

To ensure ample memory usage without overshooting, Kubernetes uses resource requests and limits. Requests set the baseline RAM needed by a container, while limits define the maximum. Proper settings prevent applications from encroaching on each other’s memory.

Ephemeral storage resources

Kubernetes uses ephemeral storage for temporary data created by containers. This includes logs, cache, and any transient data necessary for short-term operations. Ephemeral storage is crucial for self-contained applications needing scratch disk space without relying on external storage solutions.

Setting resource requests and limits for ephemeral storage helps avoid overconsumption on host nodes. It provides applications with sufficient storage for temporary tasks while preventing excessive disk usage, which can degrade node performance.

Extended and custom resources

Extended resources in Kubernetes are specialized hardware like GPUs, FPGAs, or any custom-defined resources. These resources enable applications to leverage processing capabilities or unique hardware features.

Defining custom resources requires additional configuration, informing Kubernetes of their inventory and scheduling needs. Administrators can tailor custom resources for specified workloads, integrating them into the cluster’s resource management.

Code example: Setting resource requests and limits

To set resource requests and limits in Kubernetes, developers specify the desired configurations in the resources field within each container’s definition. This example demonstrates how to define CPU and memory requests and limits for a Pod with two containers: one for the main application and one for logging.


apiVersion: v1
kind: Pod
metadata:
  name: backend
spec:
  containers:
  - name: api-server
    image: 
    resources:
      requests:
        memory: "128Mi"
        cpu: "500m"
      limits:
        memory: "256Mi"
        cpu: "1000m"
  - name: metrics-collector
    image: 
    resources:
      requests:
        memory: "128Mi"
        cpu: "500m"
      limits:
        memory: "256Mi"
        cpu: "1000m"

Important details in this configuration:

Resource requests: Each container requests 128MiB of memory and 0.5 CPU (500 millicores), indicating the minimum resources each container needs. With both containers requesting these resources, the total request for the Pod is 1 CPU and 256MiB of memory.
Resource limits: Each container has a limit of 256MiB of memory and 1 CPU (1000 millicores). These values represent the maximum resources each container is allowed. If a container attempts to exceed its CPU limit, Kubernetes will throttle it; if memory usage exceeds the limit, the container may be terminated. In total, the Pod is limited to 2 CPU and 512MiB of memory across both containers.

Best practices for resource management in Kubernetes

Here are some important practices to keep in mind when working with resource limits in Kubernetes.

Rightsizing resource requests and limits

Rightsizing involves aligning resource requests and limits closely with actual application needs. This technique minimizes waste and prevents resource starvation. It requires closely monitoring an application’s resource consumption patterns.

Tools for monitoring can provide insights that inform rightsizing decisions. By incrementally adjusting requests and limits based on application behavior, administrators can reach an optimal configuration.

Avoiding overcommitment and underutilization

Avoiding overcommitment ensures that resource requests do not exceed available capacity, preventing bottlenecks and potential failures. Underutilization, however, may lead to increased costs and waste. Striking a balance is essential for resource management in Kubernetes environments.

Administrators should fine-tune resource allocations based on consumption metrics to maintain equilibrium. This includes routinely reviewing usage patterns and adjusting configurations to reflect actual needs without oversupply.

Using LimitRanges and ResourceQuotas

LimitRanges and ResourceQuotas are tools in Kubernetes for managing resources across namespaces. LimitRanges define default and maximum resource constraints for pods, helping to standardize resource usage across deployments.

ResourceQuotas aggregate resource usage limits for a namespace, ensuring a fair share of resources among various teams or applications. By using these tools, organizations can enforce policies that optimize resource distribution and maintain cluster health.

Implementing Quality of Service (QoS) classes

QoS classes in Kubernetes prioritize resource allocation based on assigned levels: Guaranteed, Burstable, and Best-Effort. These classes offer a tiered approach to managing workloads, providing varying degrees of resource assurance based on their importance and functionality.

Assigning the appropriate QoS class ensures vital applications receive necessary resources, especially during peak loads. For example, latency-sensitive apps benefit from the Guaranteed class, while less critical tasks might use the Best-Effort category.

Leveraging Horizontal Pod Autoscaling (HPA)

Horizontal Pod Autoscaling (HPA) is a Kubernetes feature that dynamically adjusts the number of running pods based on current resource usage. By monitoring metrics like CPU utilization, HPA helps maintain application performance levels despite changing load conditions.

Implementing HPA requires carefully defined scaling policies to prevent abrupt changes that might affect stability. This automated scaling optimizes resource usage and availability, matching application needs.

Related content: Read our guide to Kubernetes autoscaling

Automating Kubernetes infrastructure with Spot

Spot Ocean from Spot frees DevOps teams from the tedious management of their cluster’s worker nodes while helping reduce cost by up to 90%. Spot Ocean’s automated optimization delivers the following benefits:

Container-driven autoscaling for the fastest matching of pods with appropriate nodes
Easy management of workloads with different resource requirements in a single cluster
Intelligent bin-packing for highly utilized nodes and greater cost-efficiency
Cost allocation by namespaces, resources, annotation and labels
Reliable usage of the optimal blend of spot, reserved and on-demand compute pricing models
Automated infrastructure headroom ensuring high availability
Right-sizing based on actual pod resource consumption

Learn more about Spot Ocean today!