Developer freedom is important to On the dot. The On the dot team have the ability to deploy and scale fast, yet this enhanced agility for the dev team comes with a trade-off: increasingly high costs. Don Tran, Head of Platform Services at On the dot, needed a way to combat the high costs without undermining the freedom and agility of the developer team.
Spot instances seemed a great way for On the dot to reduce these costs, but the On the dot infrastructure, and particularly the ECS instances backed by Auto-scaling groups, were somewhat bespoke and not immediately compatible with spot instances.
For the On the dot team, rearchitecting their environment to suit spot better may have been possible, but was a task which would require an immense amount of developer man-hours. The On the dot team’s focus would remain, and would always need to remain, with building up and maintaining the core business. Put simply, the team didn’t have the man-hours spare to undertake the huge project of adapting their infrastructure to utilize spot instances.
On the dot’s Platform Engineering team met with Spot at one of the monthly workshops run by the Spot team in London. After the session with one of Spot’s top Solution Architects, On the dot and Spot began discussing in more detail how the Spot platform could allow On the dot to utilize spot instances without creating more work for the team.
On the dot’s setup required ECS Task Placement Constraints with Spot’s ECS integration and Autoscaler. Within days the ECS Task Placement Constraints feature was added to the Spot integration and On the dot were ready to migrate. With the new feature made available to the team, the first environments to migrate over (dev and testing) were running on Spot after only an afternoon’s work.
Through Spot’s ECS integration, On the dot were able to achieve savings of 73% and now have nearly all applicable workloads running on Spot.
Spot’s ECS autoscaler features intelligent scaling, as Spot makes sure to utilize the best scaling practices to maximise cluster efficiency:
Headroom– a buffer of spare capacity (in terms of memory and CPU) will be provisioned to make sure that there is no need to wait for new instances when scaling up whilst simultaneously ensuring instances won’t become over-utilized.
Smart Scaling Down– Elastigroup by Spot will monitor the cluster for idle instances which remain underutilized for a specified amount of consecutive periods. Once identified, Elastigroup will find spare capacity in other instances, drain those instance tasks and reschedule those on other instances before terminating the idle instance.
Tetris Scaling– Elastigroup records the events written when an ECS task is pending and analyzes why they are yet to be started (i.e. Insufficient Memory / CPU, No Ports Available, etc.). It will then spin up instances inside the customer’s cluster which will resolve the issues with the pending task, allowing better optimization for containerized environments without any additional input from developer teams.
“You were available 24/7, always there with answers and resolutions. For us, this is a big sticking point when choosing SaaS vendors” – Don Tran, Head of Platform Services at On the dot
However streamlined integration processes are with Spot, complications do sometimes arise. Spot’s dedicated Customer Success team is available all day, every day to help resolve issues that might arise with customer’s accounts or else simply answer any queries to help customers better use the platform. This was important to On the dot, as their complex ECS architecture meant that being safe in the knowledge that a Spot expert was always available and more than happy to help was a big bonus for the team.
Since utilizing Spot to reduce costs without requiring additional man-hours, On the dot is free to explore new possibilities: Don’s team is looking into Amazon EKS, currently trialing how Kubernetes will work with their environment with the idea to then move this to Spot to create the most optimized environment possible! The On the dot team are also working towards creating a completely serverless architecture, dedicated as the team is to constantly utilizing the most advanced technical practices available.
Born in 2015, On the dot’s multi award-winning delivery platform connects retailers’ desire for a unique and innovative shopping experience with a shopper’s demand for delivery convenience. As part of the CitySprint Group (same day delivery leaders in the UK), they are using 10+ years of real data to make their tech smarter and deliveries more efficient.
On the dot’s focus is around innovating the urban delivery sector (deliveries with up to a 10 mile radius from pick-up to drop-off). This means investigating new concepts, software, and technologies, with every second counting in the fast-paced world of urban delivery. Their offerings range from open APIs to integrate real-time timeslot availability with retailer’s e-commerce, EPOS or app platforms, a courier-app designed to improve experience and efficiency for couriers, alongside systems such as ML algorithms focused on optimizing deliveries through consolidation and auto-assigning jobs to couriers to cut down on underutilized time.https://www.onthedot.com/
for up to 20 instances