Use Ocean’s Cluster Roll to update nodes

Reading Time: 7 minutes

Kubernetes has what you may consider an aggressive release cycle. There have been three to four releases per year since 1.0 was released in July of 2015. Perhaps you’ve found it all too easy to get behind a couple of versions. Running the latest, or nearly latest release will help protect your organization from security issues. This is because releases are deprecated once they fall three minor versions behind the latest. Staying current isn’t just about security though! You also get access to new features and enhancements.

When using a managed Kubernetes service like Amazon’s EKS, Google’s GKE, and Microsoft’s AKS, the control plane can be updated through a UI, CLI, or an API call. However, this still leaves your worker nodes running an older version. Each cloud provider has options to bring your worker nodes up to the current version. In this example, we’ll be utilizing Amazon’s EKS. Amazon has an excellent article covering the cluster upgrade process. A comment in that article is of particular relevance if you are using Spot Ocean to handle the lifecycle of your worker nodes.

We also recommend that you update your self-managed nodes to the same version as your control plane before updating the control plane. 

Spot Ocean has you covered. You can update your worker nodes using a feature called “Cluster Roll.” This feature lets you update all of the nodes that are part of a virtual node group (VNG) in an orderly fashion. New nodes are brought up with whatever changes you are requesting. (In this case, we’ll be replacing the AMI with one that matches the new Kubernetes version.) Existing nodes are marked as “NoSchedule” and are drained so that the pods are transitioned to the new nodes.

If you would like to watch a demonstration of the whole process go ahead and press play. Otherwise please scroll down and continue reading.

 

https://youtu.be/aAzNv-rsv_A

 

Let’s walk through the process together.

The Demo Environment

We have an EKS cluster running K8s v1.19 and wish to upgrade to 1.20. Kubernetes upgrades should be done incrementally, one version at a time. We can verify the current version using the AWS CLI , kubectl or looking in a web UI. Provided we have the aws cli configured with appropriate credentials, running:

% aws eks describe-cluster --name knauer-eks-Zi4XDZoO

{
"cluster": {
"name": "knauer-eks-Zi4XDZoO",
...
"version": "1.19",
...
}

returns the version. An alternative method, assuming a valid kubeconfig file is available, would be to use kubectl:

% kubectl version --short

Client Version: v1.21.2
Server Version: v1.19.8-eks-96780e
WARNING: version difference between client (1.21) and server (1.19) exceeds the supported minor version skew of +/-1

As a side note: You also need to update the version of kubectl you are using periodically in order to be able to manage your clusters. Generally you want to stay within one release of whatever cluster version you are managing. In this example, we are getting a warning because the cluster we are managing is two versions behind.

This cluster has two worker nodes currently.

% kubectl get nodes

NAME                                       STATUS   ROLES    AGE   VERSION
ip-10-0-1-127.us-west-2.compute.internal   Ready    <none>   49m   v1.19.6-eks-49a6c0
ip-10-0-3-146.us-west-2.compute.internal   Ready    <none>   45s   v1.19.6-eks-49a6c0

The first one listed is part of an AWS auto-scaling group (ASG). Note: This ASG is not an EKS managed node group. The second one is part of a Spot Ocean managed VNG. The second one will be replaced during the cluster roll.

Upgrade the Control Plane

There are multiple ways to upgrade our EKS cluster version. For this example cluster, we are going to use the AWS CLI to handle the upgrade.

From the AWS documentation, we need to run:

% aws eks update-cluster-version \
  --region <region-code> \
  --name <my-cluster> \
  --kubernetes-version <desired version>

and substitute in some values. The “region” is the AWS region this EKS cluster is provisioned in. The “name” is the name of the EKS cluster. Set “kubernetes-version” to be your desired version, which should be the currently running version plus one. Since we are at 1.19, we’ll use “1.20”.

With all the required variables replaced, we’ll run:

% aws eks update-cluster-version \
  --region us-west-2 \
  --name knauer-eks-Zi4XDZoO \
  --kubernetes-version 1.20

This will return an ID that can be used to check the status of the upgrade.

% aws eks describe-update \
  --region us-west-2 \
  --name knauer-eks-Zi4XDZoO \
  --update-id ffe2232e-6389-4880-9ecc-4a6d65c1e42d

  {
    "update": {
        "id": "ffe2232e-6389-4880-9ecc-4a6d65c1e42d",
        "status": "InProgress",
        "type": "VersionUpdate",
        "params": [
            {
                "type": "Version",
                "value": "1.20"
            },
            {
                "type": "PlatformVersion",
                "value": "eks.1"
            }
        ],
       ...
    }
  }

The upgrade process will take several minutes. Eventually the status will return as “Successful.”

{
    "update": {
        "id": "ffe2232e-6389-4880-9ecc-4a6d65c1e42d",
        "status": "Successful",
        "type": "VersionUpdate",
        ...
}

Once it has completed successfully, we can verify the control plane has been upgraded. The value returned for “Server Version” has updated and now shows 1.20.x instead of 1.19.x.

% kubectl version --short

Client Version: v1.21.2
Server Version: v1.20.4-eks-6b7464

Upgrade the Worker Nodes

While the control plane has been upgraded to 1.20, our data plane or worker nodes are still running 1.19.

% kubectl get nodes

NAME                                       STATUS   ROLES    AGE   VERSION
ip-10-0-1-127.us-west-2.compute.internal   Ready    <none>   90m   v1.19.6-eks-49a6c0
ip-10-0-3-146.us-west-2.compute.internal   Ready    <none>   42m   v1.19.6-eks-49a6c0

Get a New AMI ID

Amazon makes updated AMIs available, but we need to get the AMI ID in order to move forward. One way to do this is to run a query using the aws command.

aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.20/amazon-linux-2/recommended/image_id --region us-west-2 --query "Parameter.Value" --output text

ami-0b05016e79e1e54c6

Now that we have the AMI ID we can proceed with upgrading the worker nodes that are part of the Ocean VNG.

Edit the VNG

There are a few ways to initiate a cluster roll with Spot Ocean.

  1. Spot UI
  2. Use the spotctl CLI
  3. Spot API
  4. SDK

We’ll go ahead and use the Spot UI for this example, just be aware that there are automation friendly methods available.

Note: You can find additional information on this process in the Spot documentation.

Log into the Spot UI and navigate to the “Virtual Node Groups” tab. Edit the virtual node group by clicking the VNG name.

vng list

Now replace the “Image” value with the new AMI ID.

Edit VNG

Note: We can use the “View AMI Details” link to double-check that this is the AMI we wanted.

Finally, don’t forget to press the “Save” button after you have pasted in the new AMI ID. 

Start the Cluster Roll

Starting the cluster roll is a quick process. First, switch to the “Cluster Rolls” tab.

Cluster Roll tab

Second, select “Cluster Roll” from the “Actions” drop-down menu.

actions menu cluster roll

Since this VNG is only one node, there is no need to split the roll into smaller batches. The third step is to click the “Roll” button. This will submit the request and initiate the cluster roll.

Finally, we can switch to the “Log” tab and verify that the cluster roll started.

cluster roll log

What’s Happening in the Cluster?

Now that the cluster roll is in “InProgress”, let’s take a deeper look at what is happening inside the cluster.

Running kubectl get nodes shows that we still have our two nodes. Notice that scheduling has been disabled for the worker node that the cluster roll is about to replace.

% kubectl get nodes

NAME                                       STATUS                     ROLES    AGE    VERSION
ip-10-0-1-127.us-west-2.compute.internal   Ready                      <none>   121m   v1.19.6-eks-49a6c0
ip-10-0-3-146.us-west-2.compute.internal   Ready,SchedulingDisabled   <none>   72m    v1.19.6-eks-49a6c0

Wait a minute or so and then re-run the same command. Spot Ocean provisioned a new node in the VNG with the updated AMI. The STATUS “NotReady” tells us it isn’t ready for pods yet.

% kubectl get nodes

NAME                                       STATUS                     ROLES    AGE    VERSION
ip-10-0-1-127.us-west-2.compute.internal   Ready                      <none>   122m   v1.19.6-eks-49a6c0
ip-10-0-1-33.us-west-2.compute.internal    NotReady                   <none>   16s    v1.20.4-eks-6b7464
ip-10-0-3-146.us-west-2.compute.internal   Ready,SchedulingDisabled   <none>   74m    v1.19.6-eks-49a6c0

If we wait a minute or two, the node is now ready to go.

% kubectl get nodes

NAME                                       STATUS                     ROLES    AGE    VERSION
ip-10-0-1-127.us-west-2.compute.internal   Ready                      <none>   123m   v1.19.6-eks-49a6c0
ip-10-0-1-33.us-west-2.compute.internal    Ready                      <none>   98s    v1.20.4-eks-6b7464
ip-10-0-3-146.us-west-2.compute.internal   Ready,SchedulingDisabled   <none>   75m    v1.19.6-eks-49a6c0

Our original node running 1.19 has been removed from the EKS cluster. The new node is running v1.20.x, the same version as the updated control plane. The node that is part of the ASG, not the Ocean VNG, is still running the previous version. 

% kubectl get nodes

NAME                                       STATUS   ROLES    AGE    VERSION
ip-10-0-1-127.us-west-2.compute.internal   Ready    <none>   136m   v1.19.6-eks-49a6c0
ip-10-0-1-33.us-west-2.compute.internal    Ready    <none>   13m    v1.20.4-eks-6b7464

Summary

We’ve successfully walked through the process of upgrading an EKS cluster’s worker nodes to a new Kubernetes version using Spot Ocean’s cluster roll feature. Kubernetes version upgrades are not the only use case for a cluster roll. You might need to update to a new AMI in response to a security CVE, or have other changes to worker nodes even though the control plane isn’t being upgraded. Please stay tuned for additional posts highlighting time-saving features of Spot Ocean. Thanks for following along!