Introduction to Stateful Applications on Kubernetes

Reading Time: 3 minutes

In the upcoming weeks, I will be writing a series of blogs covering Stateful Applications running on Kubernetes.

  • #1 – This blog; Introduction to Stateful Applications on Kubernetes
  • #2 – Storage Classes & Dynamic Provisioning
  • #3 – StatefulSets & PodDisruptionBudgets

Before we begin, It is important to get an understanding of the terms Pod, Volume, Persistent Volume and Persistent Volume Claim.

Pod

A pod is a group of one or more containers (such as Docker containers), with shared storage/network, and a specification for how to run the containers.


Volume

When running multiple containers together in a Pod it is often necessary to share files between those containers; volumes are used for this purpose.
On-disk files in a container are ephemeral. when a container crashes, it is restarted, but the files will be lost – the container starts with a clean state.

Persistent Volume

A PersistentVolume (PV) is a piece of storage in the cluster. It is a resource just like a node is a cluster resource.
PVs are volume plugins like volumes but have a lifecycle independent of any individual pod that uses the PV.

Volume vs. Persistent Volume

Volumes and Persistent Volumes are similar in nature to Ephemeral disks and EBS volumes.
Volumes will be deleted when Pods are deleted, while Persistent Volumes are different entities, completely decoupled from the Pod. PVs are managed by a different set of APIs \ kubectl than Pods and have their own Lifecycle.

Volumes Internals

  • To use a volume, a pod specifies what volumes to provide for the pod (the spec.volumes field) and where to mount those into containers (the spec.containers.volumeMounts field).
  • A process in a container sees a filesystem view composed of their Docker image and volumes. The Docker image is at the root of the filesystem hierarchy, and any volumes are mounted at the specified paths within the image.
  • Volumes can not mount onto other volumes or have hard links to other volumes. Each container in the Pod must independently specify where to mount each volume.

Persistent Volume Claims

A PersistentVolumeClaim (PVC) is a request for storage. It is similar to a pod. While Pods consume node resources, PVCs consume Persistent Volume resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or read-only).


Tying it all together

In order to demonstrate how PersistentVolumes work, let’s use a real-life example of setting up a Database container on a Pod, and mount it’s data volumes under a Persistent Volume.

Persistent Volume definition: database-pv.yml

kind: PersistentVolume
apiVersion: v1
metadata:
    name: database-pv
    labels:
        type: amazonEBS
spec:
    capacity:
        storage: 5Gi
    accessModes:
        - ReadWriteOnce
    awsElasticBlockStore:
        volumeID: vol-123abcd
        fsType: ext4
$ kubectl create -f database-pv.yml

persistentvolume “database-pv” created

[alert type=”info”] Storage can be mounted by only one node for reading/writing[/alert]

PVC definition: accesing 5Gi in ReadWrite mode – database-pvc.yml

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
    name: database-pvc
    labels:
        type: amazonEBS
spec:
    accessModes:
        - ReadWriteOnce
    resources:
        requests:
            storage: 5Gi
$ kubectl create -f database-pvc.yml

persistentvolumeclaim "database-pvc" created

As a design pattern, now let’s create a Deployment resource. Eventually creating a Pod that uses the previously created PVC which claims the PersistentVolume.

[alert type=”info”] Key parts here are:
* volumeMounts define which volumes are going to be mounted. /app/database is the directory where the database Server stores all the data.
* volumes define different volumes that can be used in this RC definition[/alert]

apiVersion: apps/v1
kind: Deployment
metadata:
  name: database
spec:
  selector:
    matchLabels:
      app: database
  replicas: 1
  template:
    metadata:
      labels:
        app: database
    spec:
      containers:
      - name: database-pod
        image: repo/database
        volumeMounts:
        - mountPath: "/app/database"
          name: database-pvc
        ports:
        - containerPort: 3306
      volumes:
      - name: database-pvc
        persistentVolumeClaim:
          claimName: database-pvc
$ kubectl create -f database-rc.yml

replicationcontroller "database" created

 

All Set.

Now, in case we delete the database pod || pod gets destroyed from any reason, k8s will automatically re-create it, and the storage will still exist with the same data attached to the container, with the last bit that was written to the filesystem.