For the last 23 years, Walla!News has been running their workloads on-premise, managing the server hardware themselves. But with over 1.5 million active mailboxes, real-time broadcasts, and traffic that constantly scales, Asi, Walla!News’s CTO, decided that moving to the public cloud was inevitable. “We are a publisher in a dynamic and competitive environment which require agility and fast time to market. We wanted to be able to stay focused on building end-users products, without the overheads of on-premise data center maintainers,” said Asi.
From On-Prem to Cloud-Native in 6 months
The migration process for Walla!News was very quick and focused. After researching numerous options, the Walla!News team decided that AWS Elastic Container Service was the best fit for their needs. The migration project was put in the hands of Elad Eizner, Walla!News’s DevOps Team Manager. One by one each of Walla!News’s apps migrated to a containerized ECS environment over the next six months.
Instantly after the migration Walla!News faced two main issues:
- Instances becoming unhealthy: Walla!News was facing an issue with the instances in their clusters becoming unhealthy from time to time due to sudden, large traffic spikes. The cluster was not aware that the agent was non-operational, leading to services failing to run.
- RI utilization: Another challenge was making sure the reserved instances Walla!News had purchased were being utilized to the fullest. With multiple size-flexible reservations across different accounts, full utilization and visibility were near impossible.
Why Spot was the perfect fit to manage all Walla!News’s workloads cost-effectively
To handle Walla!News’s ECS agent issue a new feature was created: Spot ECS Auto-Healing. Spot constantly monitors the agent state and if a failure occurs, performs an automatic replacement ensuring the cluster performs as it should.
For reservation management, Walla!News configured Spot’s cross-account automatic flexible RI utilization. This ensured that all of Walla!News’s reserved instances would be fully utilized prior to any on-demand or spot. Reserved instances were now being aggregated and presented by the Spot Console in Normalized Factor Units (NFUs), which also provided Walla!News with more visibility into their RI utilization.
After a successful POC, The Walla!News DevOps team had a joint session together with the Spot team, and in a matter of a few hours 100% of the ECS clusters were running on spot instances.
Walla also enabled the Spot ECS Auto-scaler, that helped them to best utilize the running instances and optimize the cluster. When an instance is underutilized, the Auto-scaler drains it by rescheduling the tasks on the other instancing then scaling it down.
The Auto-scaler listens to the scheduler and scales up according to the task’s CPU and memory needs. Lastly, as the figure shows, Spot’s Infrastructure-aware Scheduler deals with multiple and various tasks types and assigns them to the right machine if needed – it spins a machine with the right resources for the task to be executed.
What’s next in the Walla!News and Spot journey
With 100% of the workloads managed by Elastigroup, using Spot for cost optimization became the standard at Walla!News.
Walla!News is an Israeli online publisher running one of the first and most popular web portals in Israel. The web portal provides news, search, e-mail, and many other services.
https://www.walla.co.il/