As Data Scientists and Engineers in a fast-paced corporate environment, our daily activities are filled with numerous challenges. One of the most daunting obstacles we face is the management of infrastructure and related costs. Each task or model we work on must be scheduled meticulously, cost-optimized, and equipped with the right CPU and memory specifications. On top of that, we must also supervise their dependencies. However, these surrounding activities are not related to the model and prevent us from spending more time working on the model itself. By our rough estimation, those infrastructure management and optimization tasks can take up about 20 to 30% of our time.
This is where Apache AirFlow and Spot Ocean come into play!
We strategically decided to transition our processes from running on high-cost, on-demand instances to a more sophisticated solution—running on cost-effective spot instances with EKS, all orchestrated by Apache Airflow. To manage this setup more efficiently, we harnessed the power of Spot Ocean for EKS.
The results have been nothing short of astounding. We achieved a remarkable 50% reduction in our spending for the EKS cluster, all without compromising availability. In addition, Spot Ocean’s automated Kubernetes infrastructure optimization provided us with streamlined management and freed up our time to work on more productive tasks.
Let’s dive deeper into how we accomplished this:
1. Managing dependencies with Apache AirFlow
Every task or model we handle has dependencies, often in the form of multiple data sources that are updated daily and might be interdependent. Apache AirFlow allows us to model these dependencies effectively, ensuring each task follows a logical sequence. Only when all preceding tasks are completed does the next sub-task execute. The robust UI and features of Apache AirFlow allow us to track task execution and monitor the success or failure of processes.
2. Optimized resource allocation
We can accurately determine each pod’s necessary memory and CPU requirements by specifying built-in tags in each task properly. This ensures that every pod receives exactly the resources it needs for execution, eliminating the risk of downtime due to insufficient resources or excessive cost due to overprovisioning.
3. Cost-effective EKS clusters
Compared to EC2 groups, EKS clusters are more cost-effective due to their superior resource utilization. Containers can be packed more densely onto a single EC2 instance, reducing the number of instances needed to run the same workload. Moreover, EKS scales containers based on demand, dynamically adding or removing resources to avoid waste.
4. Real-time cloud consumption monitoring with Spot Ocean
Spot Ocean allows us to track our cloud consumption cost (like CPU and memory) and leverage real-time optimization of nodes based on the real-time requirements of the pods. This enables us to have a better understanding of workload behavior and clear visibility into the model’s resources cost.
The combination of Apache AirFlow and Spot Ocean has revolutionized our infrastructure management and cost optimization approach. We no longer waste valuable time in chasing infrastructure optimization tasks, and we have extensively reduced our costs, without any impact on performance. As we continue to explore these tools, we look forward to uncovering even more ways to enhance our efficiency and productivity.
If your work involves large data models, AI training, or ML on AKS, EKS, or GKE, try Spot Ocean or schedule a demo to optimize your cloud resources and become more efficient.