The Challenge
One of our motorsports strategy products requires a lot of horsepower during ‘race day’. Every weekend we need approximately an additional 16 cores and 32GB of memory above what is required during the rest of the week.
Initially, we solved this by purchasing machines with specs that were able to handle the peak load. Of course, we found that this solution becomes quite expensive – even with the reserved instance pricing model. What’s more, it limited our ability to push the limits of the real-time processing for fear of exceeding the servers’ specs.
Why Spot
All of our applications are built as modular Docker containers and we use Rancher to manage our infrastructure and applications. Near the end of 2015, we heard about Spot as part of a Rancher virtual meetup and immediately saw the potential. The idea was to purchase only machines that met our baseline (during the week) needs and then use Spot to scale up new ‘worker’ machines to handle the peak load. Paired with Rancher, our workers auto-scale on the newly available machines, automatically connect to RabbitMQ, Postgres, and Redis, and start consuming available jobs immediately.
Using Spot we were able to reduce our server costs by about 70% over on-demand pricing. This architecture allows us to cut costs by nearly 50% over even the reserved-instance pricing without any long-term commitment of any kind. Spot is able to provide an incredibly straightforward way to manage the complexities of bidding, provisioning, and managing the replacement of spot instances that expire without any downtime.
The notion of ‘spot instances’ is initially difficult to understand for a lot of companies. At Rho AI, we found a clear use case for spot instances in one of our key products that helps us control costs without compromising performance.
Rho AI partners with organizations to identify and maximize the value of their unique data by leveraging the power of computer science, statistics, and machine learning.
https://rho.ai/