Kubernetes Limits vs. Requests: Key Differences and How They Work

What Are Kubernetes Limits? 

Kubernetes limits define the maximum amount of compute resources that a container can use. These limits prevent a single container from consuming excessive resources, which can negatively impact other containers running on the same node. When configuring a pod, administrators specify these limits to ensure resource availability and system stability.

Exceeding these limits results in the Kubernetes scheduler killing the container. The container may then be restarted depending on the restart policy defined in its deployment configuration. Setting appropriate limits is crucial to prevent resource starvation and to maintain the overall health of the Kubernetes cluster.

What Are Kubernetes Requests?

Kubernetes requests specify the minimum amount of compute resources required for a container to run. When a container is deployed, the Kubernetes scheduler uses these values to decide on which node to place the pod. Nodes must have at least the requested resources available to be eligible to host the container, ensuring the application has the resources it needs to run properly.

Requests are used by Kubernetes to guarantee that applications can access the resources they need. Unlike limits, if a container attempts to use more than the requested resources and they are available on the node, it is allowed to do so. However, this could affect the scheduling of other pods that might need these resources.

This is part of a series of articles about Kubernetes architecture.

In this article:

Kubernetes Limits vs. Kubernetes Requests: What Are the Differences?

Kubernetes limits and requests, while closely related, serve distinct purposes and affect pod scheduling and resource allocation in different ways.

1. Purpose and Function

Kubernetes limits primarily aim to set the maximum amount of compute resources that a container can consume. This is crucial to prevent a single container from using more than its fair share of resources, which could degrade the performance of other containers on the same node. Limits act as a hard cap on resources such as CPU and memory.

Kubernetes requests specify the minimum resources a container needs to start and run. This value is used by the Kubernetes scheduler to make informed decisions about where to place pods within the cluster. Nodes must have at least the requested amount of resources free to be eligible to host the pod.

2. Impact on Scheduling and Performance

Requests are considered during the initial scheduling of a pod. If a node does not have enough free resources to meet the requests of a pod, the pod will not be scheduled on that node. This ensures that each container has enough resources to perform as expected, preventing resource starvation.

Limits do not affect the initial scheduling of a pod but are crucial for managing resource usage over time. If a container tries to exceed its limit, Kubernetes takes corrective actions, such as throttling the container’s CPU usage or terminating the container if it exceeds memory limits.

3. Resource Overcommitment and Safety

Requests allow for the overcommitment of resources. Administrators can set the request values lower than the typical resource usage to maximize node utilization. This overcommitment means that while nodes may appear to have more resources allocated than physically available, not all applications will use their requested resources at the same time.

Limits provide a safety mechanism to prevent resource overcommitment from impacting the stability of the system. They prevent any single application from using more than its allocated share, even if available resources on the node temporarily exceed those limits. This helps in maintaining system stability, ensuring that no container can monopolize node resources.

4. Default Behaviors and Overhead

Without defined requests, Kubernetes treats the pod as if it has no minimum resource requirements, which can lead to pods being scheduled on nodes that are already heavily loaded, potentially causing performance issues. 

If limits are not set, a container can potentially use all available resources on a node, leading to resource starvation for other containers. 

Setting requests and limits can introduce a certain amount of overhead. Kubernetes continuously monitors the resource usage against the defined limits and requests, which can add computational overhead but is essential for effective resource management. This overhead is a trade-off for the benefit of better resource allocation.

The Importance of Kubernetes Requests and Limits 

Requests and limits are crucial for resource management in Kubernetes, ensuring that applications run reliably and efficiently. These settings prevent resource hogging and starvation, offering a balance between optimal resource utilization and fair resource allocation among different applications.

By using requests, Kubernetes can make better scheduling decisions, which in turn improves the performance and stability of applications. Limits help protect the health of the entire cluster by preventing any single application from consuming excessive system resources. Together, they provide the framework for managing compute resources in a multi-tenant environment.

How to Set Up Kubernetes Limits 

To set up Kubernetes limits, you need to define resource limits in your Pod’s configuration. Here is a step-by-step guide on how to configure these limits:

  1. Define the pod manifest: In your deployment YAML file, you specify the limits under the resources section of each container specification.
  2. Specify CPU and memory limits: You can set limits for both CPU and memory. For CPU, the limit is specified in CPU units (where 1 is equivalent to one CPU core on AWS EC2, for example). Memory is specified in bytes, but you can use suffixes like Mi (mebibytes) or Gi (gibibytes) to make it more readable.
apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
containers:
- name: example-container
image: nginx
resources:
requests:
memory: "256Mi"
cpu: "0.5"

In this example, the example-container is limited to 1 CPU core and 512 MiB of memory.

  1. Apply the configuration: Use the command kubectl apply -f <filename>.yaml to apply your configuration. Replace <filename>.yaml with the name of your file containing the pod definition.
  2. Monitoring and adjustment: After deployment, monitor the container’s resource usage to ensure that the limits are appropriate. Adjust them as necessary based on the performance and resource usage observed.

How to Set Up Kubernetes Requests

Setting up Kubernetes requests involves defining the minimum resources required for your container to operate. To configure these requests:

  1. Define the pod manifest: Similar to setting limits, requests are specified in the deployment YAML under the resources section for each container.
  2. Specify CPU and memory requests: Requests indicate the guaranteed amount of resources that Kubernetes must allocate to the container. If these resources are not available on a node, the pod won’t be scheduled until they become available.
apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
containers:
- name: example-container
image: nginx
resources:
requests:
memory: "256Mi"
cpu: "0.5"

Here, the example-container requires at least 0.5 CPU cores and 256 MiB of memory to run. This setup helps Kubernetes in making scheduling decisions.

  1. Apply the configuration: Deploy the configuration using kubectl apply -f <filename>.yaml.
  2. Evaluate performance: Regularly assess the container’s resource consumption to determine if the set requests are adequate or need adjustment based on actual usage patterns.

Best Practices for Setting Kubernetes Requests and Limits 

Here are some recommended practices for setting limits and requests in Kubernetes.

Ensure Requests and Limits Are Rightsized

To maintain optimal performance and cost-efficiency, it is crucial to regularly evaluate and adjust Kubernetes requests and limits according to the actual needs of your applications. This process, known as rightsizing, involves analyzing historical resource usage data to ensure that settings are aligned with current demands. 

Ensure Memory Requests and Limits Are Equal

For memory management, it is advisable to set equal requests and limits. This practice stabilizes the memory allocation to containers, reducing the likelihood of containers being terminated due to exceeding allocated memory limits. It also helps in managing memory more predictably, enhancing container performance by minimizing memory fragmentation and ensuring that applications have sufficient resources to operate efficiently.

Avoid Using CPU Limits

Setting CPU limits can be counterproductive as it might throttle the performance of applications, leading to increased response times and decreased throughput. Instead, defining CPU requests allows applications to access the necessary CPU resources under normal load and additional resources when available, providing better flexibility and application responsiveness. 

Learn more in our detailed guide to kubernetes cpu limit.

Use Horizontal Pod Autoscaling (HPA) for Dynamic Workloads

Kubernetes Horizontal Pod Autoscaling adjusts the number of pod replicas in a deployment based on observed CPU utilization or other metrics, essential for managing workloads with variable demand. By automatically scaling resources up during peak times and down during low usage periods, HPA ensures that applications maintain high performance and availability while optimizing resource utilization and reducing costs.

Learn more in our detailed guide to kubernetes hpa.

Implement Quality of Service (QoS) Classes

Kubernetes supports three Quality of Service (QoS) classes: Guaranteed, Burstable, and BestEffort, which are determined by how requests and limits are set on containers. Utilizing these QoS classes allows for the prioritization of critical workloads, ensuring they receive necessary resources especially under resource-limited conditions. 

Related content: Read our guide to Kubernetes monitoring

Automating Kubernetes Infrastructure with Spot by NetApp

Spot Ocean from Spot by NetApp frees DevOps teams from the tedious management of their cluster’s worker nodes while helping reduce cost by up to 90%. Spot Ocean’s automated optimization delivers the following benefits:

  • Container-driven autoscaling for the fastest matching of pods with appropriate nodes
  • Easy management of workloads with different resource requirements in a single cluster
  • Intelligent bin-packing for highly utilized nodes and greater cost-efficiency
  • Cost allocation by namespaces, resources, annotation and labels
  • Reliable usage of the optimal blend of spot, reserved and on-demand compute pricing models
  • Automated infrastructure headroom ensuring high availability
  • Right-sizing based on actual pod resource consumption  

Learn more about Spot Ocean today!