Trax chooses Elastigroup by Spot to provision, manage and scale infrastructure on Google Cloud
Trax’s system captures and analyzes images of shelves at retail stores. It then produces actionable data about how to best organize, price and promote the products on those shelves. This system is powered by advanced deep learning algorithms that process millions of images each month with actions such as image recognition and stitching images of shelves together. This massive effort requires significant amounts of CPU and GPU hours, all of which need to scale according to demand.
The Trax team needed a system that would create a dynamically scaling multi-cloud infrastructure that prevented costs from running wild. To keep costs under control, Trax chose to use Preemptible VMs on Google Cloud.
Preemptible VMs are Google Cloud’s excess compute capacity. Google offers these VMs at a steep discount, but they can be interrupted at a moment’s notice. While solving one problem, using Preemptible VMs created another due to their ephemeral nature. The Trax system now needed to handle scaling, integration with multiple Google Cloud services such as Pub/Sub and Backend Services, and reliably running image recognition workloads on Preemptible VMs. Since the Trax team has been using Elastigroup by Spot to take on similar challenges on AWS, leveraging Spot’s support for Google Cloud was the natural move.
Trax’s system receives images and distributes them to multiple VMs for processing. Their multi-cloud infrastructure scales automatically based on the Pub/Sub queue depth. To prevent VMs that are actively processing images from being terminated during scaling actions, Trax uses Elastigroup’s Lock/Unlock feature, which protects VMs from being modified for a duration of time. In this case, VMs are locked for the duration of the message timeout.
Auto Scaling that Gets Smarter
As Trax’s system Locks and Unlocks VMs, Elastigroup learns how long they need to run. Elastigroup then uses this information to ensure fast and cost-effective scaling, by choosing Virtual Machines that can match this timeframe without being interrupted.
Here are the integrations and services Trax uses to make their system run smoothly while keeping costs low:
Elastigroup by Spot – A cluster orchestration software that enables Trax to reliably run their workloads on Preemptible VMs. Elastigroup predicts interruptions ahead of time and redistributes workloads to maintain maximum availability for minimum cost. Elastigroup is able to run workloads on a mix of On-Demand and Preemptible VMs to ensure uptime.
Backend Services – Elastigroup by Spot integrates with Google Cloud’s load balancing solution, Backend Services. When scaling VMs or replacing unhealthy ones, Elastigroup automatically registers and deregisters the machines with the load balancer. This way the load balancer is able to continue to distribute traffic seamlessly while Elastigroup optimizes the underlying compute. Trax leverages Elastigroup’s integration with both global and regional backend services.
Pub/Sub – Trax uses Elastigroup’s integration with Pub/Sub to control infrastructure scaling. Elastigroup’s scaling policies can use a variety of metrics, custom or predefined. Trax has their Elastigroup set up to base auto-scaling on Pub/Sub queue performance.
Lock/Unlock VMs – Trax uses Elastigroup’s Lock/Unlock feature to protect active VMs from interruptions during scaling actions.
Full visibility – The Elastigroup dashboard provides the Trax team with deep visibility into server performance, health, availability, and cost, and provides Trax with a consistent experience and a single pane of glass across AWS and GCP.
Blue/Green Deployment – To keep their application updated, patched and secure, Trax uses Elastigroup’s Deployment feature. In the configured intervals, Elastigroup will spin up new, updated servers, and will only terminate the existing group once the health of the new servers has been verified.
The Result - Reliable use of Preemptible VMs in Production at 80% Less
By running their complex environments on Elastigroup, Trax has been able to build a dynamic & multi-cloud solution that scales automatically based on demand, and reduces costs by about 80% by reliably leveraging Google Preemptible VMs with Elastigroup’s smart provisioning features.