Google Kubernetes Engine: Architecture, Pricing & Best Practices

What Is Google Kubernetes Engine (GKE)?

Google Kubernetes Engine (GKE) is a managed container orchestration platform offered by Google Cloud. It enables users to easily deploy, manage, and scale containerized applications using Kubernetes. Kubernetes is an open-source container orchestration tool that automates the deployment, scaling, and management of containerized applications.

With GKE, users can create, manage, and scale Kubernetes clusters, which are groups of virtual machines (VMs) that run containerized applications. GKE handles the underlying infrastructure, including provisioning and managing VMs, and automatically adjusts resources based on demand to ensure that applications run smoothly and efficiently.

GKE provides a number of benefits, including easy management of Kubernetes clusters, automatic scaling and load balancing, high availability, security, and easy integration with other Google Cloud services. Additionally, GKE is built on the same infrastructure and security as Google, providing users with the confidence that their applications are running on a secure and reliable platform.

In this article:

  • GKE Architecture
  • How GKE Works
    • Cluster Orchestration with GKE
    • GKE Workloads
    • GKE Networking
    • Modes of Operation
  • GKE Pricing Models and Options
  • Google Kubernetes Engine Best Practices
    • GKE Autoscaling
    • Choose the Right Machine Type
    • Enable GKE Usage Metering
    • GKE Monitoring

GKE Architecture

The architecture of GKE includes several components that work together to provide a scalable and reliable container orchestration platform. Here are the key components:

Image Source: Google Cloud

Control plane
GKE uses a Kubernetes control plane to manage and orchestrate containers. The control plane consists of several components, including the Kubernetes API server, etcd data store, kube-scheduler, and kube-controller-manager. These components are responsible for managing the state of the cluster, scheduling containers, and maintaining the desired state of the system.

Clusters are groups of nodes that run containerized applications and are managed by GKE. Users can create and manage multiple clusters within GKE, each with its own set of nodes, pods, services, and ingress rules. Clusters can be used to separate different environments, such as development, testing, and production, or to isolate different applications within the same organization.

Nodes are VMs that run containerized applications. GKE creates and manages these nodes automatically, and users can specify the number of nodes, machine type, and other configuration options. Nodes run a Kubernetes agent called kubelet, which communicates with the control plane and manages containers running on the node.

Pods are the smallest deployable units in Kubernetes and represent one or more containers running on a node. Pods are used to group containers that need to be co-located, share resources, and communicate with each other.

Containers are lightweight, portable units that package software and all its dependencies, allowing applications to run reliably across different computing environments. Containers provide a consistent runtime environment and enable easy deployment and scaling of applications. GKE uses containers to run applications within pods on nodes, and supports a variety of container runtimes, including Docker and containerd.

Services provide a stable network endpoint for accessing pods. Services can be used to load balance traffic across multiple pods and provide a single IP address or DNS name for clients to connect to.

Ingress is a Kubernetes resource that provides external access to services in the cluster. Ingress allows users to define rules for routing traffic to different services based on the request path or host.

Learn more in our detailed guide to GKE ingress (coming soon)

Container registry
GKE includes a built-in container registry, which allows users to store and manage container images used in the cluster. The registry supports the Docker image format and can be used to push and pull images from the cluster.

How GKE Works

Here is an overview of how GKE works:

Cluster Orchestration with GKE

GKE clusters use the Kubernetes open source cluster management system to provide the mechanisms for interacting with the cluster. Users can deploy and manage applications, perform administration tasks, set policies, and monitor workload health using Kubernetes commands and resources. Kubernetes draws on Google’s design principles and experience running production workloads in containers, offering benefits such as automatic management, scaling, and rolling updates.

In addition to Kubernetes, GKE provides advanced cluster management features, including Google Cloud’s load balancing for Compute Engine instances, node pools for flexibility, automatic scaling and upgrades for nodes, node auto-repair to maintain availability, and logging and monitoring with Google Cloud’s operations suite for visibility into the cluster. These features allow users to easily manage and scale their containerized applications, without having to worry about the underlying infrastructure.

GKE Workloads

GKE workloads refer to the different types of Kubernetes objects that users can deploy on a GKE cluster to run their applications. GKE supports several types of Kubernetes workloads, each designed for different use cases. Here’s an overview of GKE workloads:

  • Deployments: Manage stateless applications, such as web servers or microservices, that can be scaled up or down based on demand. Deployments define a desired state for the application, including the number of replicas, resource requirements, and network configurations, and Kubernetes automatically handles the deployment and scaling of the application.
  • StatefulSets: Manage stateful applications, such as databases, that require stable network identities and persistent storage. StatefulSets ensure that each pod in the application has a unique identity and that the pods are deployed and scaled in a specific order to maintain the application’s state.
  • Jobs: Run batch or one-time tasks, such as data processing or backups. Jobs are designed to run to completion, and Kubernetes automatically creates and manages the necessary resources for the job, including pods and containers.
  • CronJobs: Schedule periodic tasks, such as backups or data cleanup. CronJobs allow users to define a schedule for running a job, and Kubernetes automatically creates and manages the necessary resources for the job, including pods and containers.

GKE Networking

GKE provides a flexible and customizable networking model that allows you to configure and manage your Kubernetes cluster’s network resources. Here are some key features of GKE networking:

  • Virtual Private Cloud (VPC): GKE clusters run within a VPC network, which provides a secure and isolated environment for your Kubernetes nodes and pods. You can choose to create a new VPC or use an existing one, depending on your needs.
  • Ingress networking: GKE provides an Ingress resource that allows you to expose your Kubernetes Services to the internet using HTTP or HTTPS. You can choose from several Ingress controllers, such as GCP Load Balancing or Nginx, to manage your Ingress networking.
  • Network policies: GKE provides Network Policies that allow you to control the traffic flow between your Kubernetes pods and Services. You can define rules based on IP addresses, ports, and protocols to restrict or allow traffic within your cluster.
  • Cloud Load Balancing: GKE provides integrations with Google Cloud Load Balancing, which allows you to distribute traffic to your Kubernetes Services across multiple regions and zones. You can choose from different load balancing options, such as HTTP(S), TCP, or UDP, depending on your requirements. Learn more in our detailed guide to GKE load balancer (coming soon)

Modes of Operation

GKE supports two modes of operation:

  • Standard mode: In this mode, GKE creates and manages nodes for running containerized applications. Nodes run in a customer-managed VPC, providing users with full control over the network configuration and security settings.
  • Autopilot mode: In this mode, GKE provides a fully managed experience, handling the underlying infrastructure, security, and scalability automatically. Autopilot mode includes features such as workload identity, which allows users to securely access Google Cloud services, and node auto-upgrades, which ensure that nodes are always running the latest security patches and updates.

GKE Pricing Models and Options

GKE offers several pricing models, depending on the mode of operation and the features used. Here’s an overview of GKE pricing:

Autopilot mode vs. standard mode
The main pricing difference between autopilot mode and standard mode is the way cluster management is charged. Autopilot mode charges a flat fee per hour, while standard mode charges both a flat fee and the cost of cluster resources.

Autopilot mode is a good option for users who want a fully managed experience and prefer predictable costs, while standard mode is a good option for users who want more control over the underlying infrastructure and are willing to manage the cluster resources themselves.

Cluster management fee and free tier
For standard mode, GKE charges a cluster management fee of $0.10 per hour for each cluster, in addition to the cost of cluster resources. However, GKE offers a free tier that provides $74.40 in monthly credits per billing account, applicable to zonal and autopilot clusters.

Multi-cluster ingress
GKE offers multi-cluster ingress, a feature that allows users to route traffic across multiple clusters. It is priced based on the number of endpoints and ingress objects used.

Backup for GKE
GKE offers a backup feature that allows users to back up and restore their Kubernetes resources, such as deployments and stateful sets. Backup pricing is based on the amount of data stored and the frequency of backups.

Pricing calculator
GKE offers a pricing calculator that allows users to estimate the cost of running their containerized applications on GKE. The calculator takes into account factors such as the number of nodes, node type, region, and network egress.

Google Kubernetes Engine Best Practices

GKE Autoscaling

GKE autoscaling refers to the ability of GKE to automatically adjust the number of resources allocated to a cluster based on the demand for containerized applications. GKE supports several types of autoscaling, including:

Horizontal pod autoscaler (HPA)
HPA automatically scales the number of replicas of a pod based on resource utilization, such as CPU and memory. Users can define scaling policies that set minimum and maximum thresholds for resource utilization, allowing HPA to adjust the number of replicas up or down to maintain the desired resource utilization levels. HPA can be used to scale stateless applications, such as web servers, that can be easily replicated.

Vertical pod autoscaler (VPA)
VPA adjusts the resource allocation of pods, such as CPU and memory, based on their usage. VPA can adjust the resource allocation up or down based on demand, allowing pods to use only the resources they need. VPA can be used to optimize the performance and cost-effectiveness of stateful applications, such as databases, that require stable network identities and persistent storage.

Cluster autoscaler
Cluster autoscaler adjusts the number of nodes in a cluster based on the demand for containerized applications. It can scale the number of nodes up or down based on factors such as pod scheduling and utilization, allowing users to optimize resource utilization and cost-effectiveness.

Cluster autoscaler can be used to ensure that the cluster has enough resources to handle peak demand, and can also be used to save costs by scaling the cluster down during periods of low demand.

Choose the Right Machine Type

GKE offers a variety of machine types with different performance characteristics and prices. Here are some considerations when choosing a machine type:

  • Compute workload: The machine type should be selected based on the type of workload the application requires. Workloads that require high amounts of CPU or memory may need higher performance machine types, while those that are more I/O-bound may require higher disk I/O or network bandwidth.
  • Price-performance tradeoffs: Machine types with higher performance characteristics generally come with a higher price tag. Users should consider the tradeoff between performance and price when choosing a machine type. For example, a high-end machine type may provide better performance but may not be cost-effective for all workloads.
  • Preemptible VMs (now called Spot VMs): Preemptible VMs are a type of virtual machine instance that can be purchased at a significantly lower price than standard instances. The caveat is that they run for up to 24 hours and can be interrupted by Google Compute Engine at any time with a 30-second warning. The new generation of Preemptive VMs is called Spot VMs, these have the same pricing model, but provide additional features. In particular, they do not have a maximum runtime unless the user limits it. Preemptible/Spot VMs can be a good option for non-critical workloads or those that can handle interruptions gracefully.
  • E2 machine types: This family of VM instances is optimized for general-purpose workloads, offering a balance of compute, memory, and network resources at a lower price point than some other machine types. E2 machine types can be a good option for workloads that require a moderate amount of compute and memory resources.

Enable GKE Usage Metering

GKE Usage Metering is a feature of Google Kubernetes Engine that provides detailed usage data for GKE clusters and nodes, allowing users to track resource usage and optimize costs. Usage metering collects data on the use of GKE clusters and nodes, including CPU utilization, memory usage, network egress, and persistent disk usage.

With GKE Usage Metering, users can get insights into their GKE usage and optimize their deployments for cost-effectiveness. For example, usage data can be used to identify underutilized resources, such as nodes or persistent disks, that can be scaled down or removed to save costs. Usage data can also be used to identify overutilized resources, such as CPU or memory, that can be scaled up to improve application performance.

Usage metering data is available through Google Cloud Billing Reports and can be exported to BigQuery for further analysis. GKE Usage Metering is available for both autopilot and standard modes of operation and can help users optimize their GKE deployments for performance and cost-effectiveness.

GKE Monitoring

Google Kubernetes Engine (GKE) provides integrations with monitoring and observability tools such as Cloud Logging and Cloud Monitoring. GKE also offers Google Cloud Managed Service for Prometheus, which enables you to monitor and alert on your workloads without the need for manual management of Prometheus.

Cloud Operations for GKE can monitor GKE clusters, providing a customized dashboard for managing and monitoring the clusters. The dashboard enables you to view key metrics like CPU and memory utilization, open incidents, and inspect different components like namespaces, nodes, workloads, services, pods, and containers, as well as view metrics and log entries for pods and containers over time.

Learn more in our detailed guide to GKE monitoring (coming soon)

Ensure availability and optimize Google Kubernetes Engine with Spot by NetApp

Spot by NetApp’s portfolio provides hands-free Kubernetes optimization. It continuously analyzes how your containers are using infrastructure, automatically scaling compute resources to maximize utilization and availability utilizing the optimal blend of spot, reserved and on-demand compute instances.

  • Dramatic savings: Access spare compute capacity for up to 91% less than pay-as-you-go pricing
  • Cloud-native autoscaling: Effortlessly scale compute infrastructure for both Kubernetes and legacy workloads
  • High-availability SLA: Reliably leverage Spot VMs without disrupting your mission-critical workloads