What is Kubernetes Persistent Volume?
In Kubernetes, the term PersistentVolume (PV) refers to a storage resource designed to help you durably manage the storage of your containerized applications. PVs are the persistent storage element in a Kubernetes architecture. PV resources belong to the cluster and exist independently of pods. Any disks and data represented by a PV continues existing even as the cluster changes and pods are deleted and recreated.
While it is possible to manually create a PV, you can dynamically provision PVs, and have their lifecycle managed by Kubernetes. PVs can be dynamically provisioned according to PersistentVolumeClaims (PVCs), which specify the details of the request for resources.
In this article, you will learn:
- Why Do We Use Persistent Volumes?
- Lifecycle of a Volume and Claim
- Tools for Persistent Volumes and Storage
- Automating Kubernetes Infrastructure with Spot by NetApp
Why Do We Use Persistent Volumes?
When containerization was first introduced, early adoption was mainly for stateless services. Today, data-centric applications in the cloud are adopting containers with the introduction of Persistent Volumes, making it possible to store, retain, and backup container-based data.
Persistent Volumes let you introduce consistency by ensuring that a database has access to its data at all times. This means you can use Kubernetes for databases, such as MySQL and Cassandra. Once data is consistent, you can add complex workloads, including both stateless and stateful code, into your containers.
Persistent Volumes help simplify the deployment of stateful, distributed applications. In this case, you need to ensure that each pod is created with the relevant configuration and environment variables. Then, according to the specifications of a PVC, a PV is matched with a pod that meets the requirements. The storage is then mounted inside the pod. This process enables you to easily scale and maintain the state of the pods and quickly replace resources in case of failure.
Learn more in our detailed guide to kubernetes pod.
Lifecycle Stages of a Persistent Volume and Claim
PVs and PVCs follow a lifecycle that starts with provisioning, moves on to binding, which is followed by using, and then can shift to reclaiming, retaining, and finally deletion.
Provisioning
Here are the two main options available for provisioning PVs:
- Static provisioning—involves manually creating PVs that contain the specs of the storage available for cluster users. This type of PV is located and available for consumption from within the Kubernetes API.
- Dynamic provisioning—enabled by the use of PVCs. If there is no available manually-created PV, Kubernetes uses PVCs to meet demands.
Binding
The binding process ensures that PVs meet user demands without wasting volume resources. The goal is to match PVCs with PVs that contain the amount of required resources, and then bind them together. This match then becomes exclusive, using a ClaimRef-based one-to-one mapping that creates a bi-directional binding.
Using
To meet user demands, clusters mount only a bound volume for a pod. Once this happens, the bound PV is reserved for the user. Users can schedule pods and obtain claimed PVs by adding a persistentVolumeClaim section in the volumes block of the pod template.
Learn more in our detailed guide to kubernetes job scheduler.
Reclaiming
Once users no longer need their volume, they can delete PVC objects from the reclamation API. The cluster uses a reclaim policy to learn what to do with the volume after its claim is released. At the moment, volumes can be either retained, recycled, or deleted.
Retain
The retain reclaim policy enables manual reclamation of a resource. When a PVC is deleted, the PV continues existing even though the volume is released. Because the PV still contains the data of the previous user, the volume is not immediately available for another claim. To reclaim a volume, you need to manually configure the process, mainly by cleaning up the data.
Delete
The delete reclaim policy enables you to remove the PV object and any associated storage assets existing in the external infrastructure. Note that dynamically provisioned PVs inherit the reclaim policy of their StorageClass, which defaults to delete.
Tools for Persistent Volumes and Storage
Storage Plug-Ins
A plug-in provides extended functionality. A storage plug-in can help you leverage more management controls for your persistent storage. For example, basic functionalities of volume plug-ins let you create, mount, and delete persistent volumes. You can also use plug-ins to add support for commands from Kubernetes.
In fact, Kubernetes offers a native storage plug-in, and the majority of storage companies use plug-ins to provide incremental features. For example, a container API built into a third-party storage solution often provides features that simplify the container management process. Typically, these features help consume existing storage and manage volumes from multiple hosts.
Data Volume Containers
A data volume is a file directory that exists independently (outside of the Union File System) as a file directory on the host filesystem. The purpose of a data volume is to introduce data persistence into the container lifecycle. You can use a data volume to manage data for multiple containers—each container uses the data volume as an access point to the needed data. A data volume can persist after a container is deleted.
It is relatively easy to configure a data volume. However, managing data volumes can quickly turn into a complex operation. When containers are deleted, their data can become orphaned. Not all orchestrators catch this data and clean it up. It is possible to garbage-collect orphaned data, because data volumes can be accessed directly from the host. Note that this process can lead to the corruption of data access privileges.
Directory Mounts
A directory mount creates a connection between the host and the container, which maintains the data structure from the host to the container. This process enables persistence and reusability. However, because directory mounts provide read and write access, they might create security gaps.
For example, a directory mount can gain access to a host system directory. A connected container can then also gain the ability to modify or delete content. This type of connection creates a vulnerability that might enable malicious actors to delete an entire data volume and manipulate data.
Automating Kubernetes Infrastructure with Spot by NetApp
Spot by NetApp recently introduced the concept of storageless, building off the capabilities of Spot Ocean, our serverless container engine, that delivers compute infrastructure intelligently and automatically. With storageless volumes, the complexities and overhead of storage management are eliminated.
Much like with serverless computing, storage volumes are dynamically managed based on how applications are consuming them. This approach enables developers and operators to build and run applications without architecting the size and shape of Persistent Volumes, including throughput, maintenance and capacity provisioning. Spot Ocean takes an application driven approach to provisioning, scaling and managing both Kubernetes nodes and storage classes for the highest performance and the lowest possible cost. Users need to only define simple storage requirements which are maintained as storageless volumes, and when pods request storage, Ocean delivers the right type, with the right volume size.
Learn more about leveraging storageless volumes as part of your infrastructure management solution with Spot by NetApp