Managing container environments, such as Kubernetes data planes and ECS requires more than achieving cost savings. Yes, this is a vital component of optimizing your containers. But you also need to optimize for availability and usage, all while having a clear visualization of how your costs break down across compute, network, and storage.
Some engineering teams may opt for an open-source solution for different challenges of scaling in the cloud, all while saving on the expense of building or buying a new tool. But a solution’s cost is more than its price. The total cost of ownership (TCO), including the setup and maintenance of a solution, must be considered when assessing return on investment (ROI) — even for “free” open-source solutions. Above all is the specific nature of open-source solutions, whereas each one usually addresses a single point challenge. For passionate engineers, onboarding and integrating multiple open-source solutions into a coherent workflow might be a fun intellectual challenge. But from an organization perspective, it’s nothing short of a disaster. That’s not just for the time spent, but the enormous risk of having shadow IT full of dependencies in your holy of holies, namely your IT infrastructure.
A scalable, practical, enterprise-grade solution should offer all-in-one container management that binds optimization techniques with visibility, analytics, automation, and multi-cloud support. However, lean DevOps teams shouldn’t commit to a DIY or open-source autoscaler before they have a clear understanding of the overlooked aspects of container optimization.
Container optimization basics
Container architectures are far more complex than instances and virtual machines (VMs) because they allow unlimited granularity. Therefore, although in theory container architectures are less compute-consuming than monoliths, without a comprehensive optimization strategy, their complexity and cost might easily spin out of control. This will reduce your scale potential because you won’t have enough compute budget or engineering hours to run this in full.
On a basic level, there are four key pillars of container optimization, and steps you can take to ensure the success of each.
1. Infrastructure optimization
When optimizing containers, you want to take infrastructure into account just as you would with your overall cloud optimization strategy. Optimize your containers for:
- Availability & consistency: Avoid downtime and reduce the impacts of a potential failure by using tactics like spreading pods across nodes and ensuring fallbacks.
- Performance: Monitor memory and CPU/GPU capacity. The growing popularity of artificial intelligence (AI) and machine learning-driven container workloads can benefit greatly from hardware acceleration, namely increased GPU capacity. Autoscaling up and down can help you avoid outages or wasteful overprovisioning.
- Cost: Pay attention to pricing models and leverage purchase commitments and discounted compute.
- Latency: Make sure your data is in the region closest to the user to avoid delays in data transfers.
2. Application, OS, and cluster-level optimization
Ensuring that your containers are optimized needs to be done at a more granular level than you might be used to. Where some aspects of optimization can use the rightsizing techniques you’d use with instances or virtual machines, container optimization also uses a process known as bin packing.
Put simply, bin packing places workloads into nodes in the most resource-efficient way possible. By consolidating these resources, bin packing reduces the total number of worker nodes by de-fragmenting under-utilized nodes. When done effectively, bin packing creates well-utilized nodes and can greatly reduce the costs of operating containers.
There’s a saying in business that you can’t manage what you can’t measure. Similarly, you can’t analyze what you can’t visualize. To take meaningful action on your container utilization and spend, it’s not enough to have that data logged “somewhere.” You need a graphic presentation of the data in front of you. And so do those non-technical cloud stakeholders to which you need to justify your compute consumption and costs.
Cloud stakeholders need intuitive visualization to do the analytical aspect of their jobs, such as:
- associating spend with business units, teams, and line items
- identifying and forecasting the compute needs of different products and projects
- surfacing waste pockets, inefficient cost centers, and streamlining opportunities
Without visualization built into your container management, you cannot possibly communicate your needs and achievements to those functions. Needless to say, if your solution is a patchwork of open-source tools, you’ll need to develop that visualization in-house. Unfortunately, most teams don’t have the time, resources, or skills to pull that off.
4. Automation and integration
Container optimization is an ongoing process, not a one-and-done activity. True optimization does not rely on manual efforts to efficiently scale container usage, as this takes more time and ultimately delivers less savings. Instead, engineering teams need to take advantage of automation and integration with existing tools to continuously manage the scale and sizing of their clusters based on container resource needs.
Developing a container optimization strategy: What Docker, K8s, and cloud provider autoscalers can and cannot do
Kubernetes provides autoscaling functionality that can automatically adjust the number of containers running based on demand. This can help optimize resource utilization and reduce costs by only running the necessary number of containers. However, like cloud native tools, K8s autoscalers have their limitations.
Docker files are where application structure and its capacity requirements are stated. Kubernetes (K8s) executes that — namely, deploying the Dockerized application — in a straightforward manner. Its native Cluster Autoscaler simply defines required capacity (such as instance hours) and leaves the execution to the cloud provider’s internal autoscalers. The cloud provider’s autoscalers can increase or decrease the number of nodes and pods when workloads are queued or nodes are idle, but that’s it. Both Kubernetes and cloud provider autoscalers equate consumption with quantity of resources rather than maximizing efficiency with regard to machines and costs.
Additionally, the Kubernetes Cluster Autoscaler is cloud agnostic and lacks the tight integration with AWS or Azure often sought by engineering teams. As a result, its ability to leverage different instance sizes or types to optimize costs is very limited.
Why have a third-party optimizer: What hyperscaler optimizer tools can and cannot do
Cloud providers offer K8s services with some cluster management functions. These are usually more on the reporting side of optimization, and less on the proactive action side. See for example AWS’s EKS cluster management capabilities. With Azure’s AKS, you’ll find optimization guidelines known as “well-architected framework”. Bottom line, with both services you receive recommendations that you must manually implement as you design and configure your K8s clusters.
Some cloud providers adopted open-source Kubecost as their cost analysis framework, but this is yet another source of data and recommendations. What you really want is something to optimize for you – a container optimization tool capable of automated actions.
What types of container optimizers are there?
Container optimization tools are roughly divided into two: Open-source tool, and enterprise suites offered by commercial vendors.
Open-source solutions tend to exclusively support a single cloud provider (e.g., Karpenter for AWS). Others, like Koku (sponsored by RedHat), support several K8s platforms. You may run into so-called “open-source” container cost optimization solutions, like Kubecost or Opencost, but these are in fact reporting and analysis tools.
What’s in a “free” solution’s TCO?
Open-source tools are favored by engineers for being free to use and for the technical curiosity they rouse with their DIY implementation and customization. But this comes with a price, especially when you operate at scale. These “free” tools could instead add up to:
- Cost to scale, cost to maintain: Open-source or DIY solutions are not “fire and forget.” They take time-consuming engineering work to set up and maintain, including many third-party integrations to account for missing, yet necessary, capabilities. This, of course, requires a deep understanding to configure correctly per cluster. What’s more, you’ll have to update these configurations manually as your product and workloads evolve. The headache increases when your cloud-native estate is complex and heterogeneous, and you must configure differentially for different clusters, clouds, environments, and workloads. Eventually, you’ll be running way fewer clusters than needed just to stay in control — damaging both efficiency and customer experience.
- No UI means no cost visibility: With open-source or DIY container optimization tools, you have limited visibility into savings. You need to put effort into analysis to show your achievements and measure them going forward, and that’s harder to do without built-in tools for reporting and visualization. This is especially important in those organizations where Dev teams own their clusters and need a UI to manage them.
- Poor governance from missing enterprise capabilities: Lack of permission management, RBAC, SSO, notifications and a single platform to tie it all together makes open-source solutions unsafe for anything involving intellectual property, user data, or organizational expenses. Remember you always want to protect your infrastructure from insider and outsider threats alike.
- Templates and automations don’t come standard with open source: Your Kubernetes clusters multiply faster than your DevOps team grows. Soon follows the pressing need to automate away the inherent complexity of managing containers. The temptation to deploy free open-source tools is too great when your compute budget is already tight. Yet open-source tools rarely automate the tasks that take the most of your time, like bin packing and AMI updates. Should teams decide to build rather than download, that could mean even more time lost. Automated features of SaaS solutions are often more robust and might include auto healing, rightsizing, cluster rolls, fallback to on-demand and reinstatement of spots when possible, and more.
Choosing your third-party optimizer: Open source or SaaS?
While open source can be a short-term fix, engineering teams need to consider their longer-term needs. A SaaS (Software as a Solution) suite dedicated to container optimization can provide additional benefits that open-source solutions don’t offer, such as:
- Dedicated dev, product, and customer service support
- Service-level agreement (SLA) for availability and other services
- Robust feature sets with built-in capabilities (e.g., visibility, security, etc.)
- Out-of-the-box setup, with less need for self-made configurations and maintenance
Wondering how container optimization solutions compare? Get an in-depth comparison of DIY and managed suite container optimization solutions for AWS.