EMR workload continuity with Elastigroup

Shay Cohen

June 7, 2021

1 min read

Amazon’s Elastic Map Reduce or EMR, makes it easy to set up, operate and scale big data environments. This enables data scientists and developers to rapidly analyze massive amounts of structured and unstructured data. Combined with Spot’s Elastigroup, data scientists can reliably run EMR core and task nodes on highly affordable EC2 spot instances.

Eliminating a single point of failure

EMR’s popularity and broad adoption is due to its powerful parallel processing–where datasets are distributed and processed across multiple compute nodes. All this data processing distribution as well as cluster health checks, is performed by EMR master nodes.

However, if the master node fails, all data processing stops, and the cluster is automatically terminated.

To avoid this, Elastigroup now supports defining multiple master nodes in a single EMR cluster. This ensures that in case of a primary master node failure, it will automatically fail over to a standby master node without impacting data processing.

Note: This is currently available for EMR version 5.23.0 and higher.

Getting started with EMR multiple master nodes

Using Spot’s API the following values need to be configured:

Master group target value should be set to 3 (this is value currently supported by AWS)
Master group Lifecycle must be ON_DEMAND
We recommend changing the terminationprotection value to be false, otherwise you won’t be able to terminate the cluster using Elastigroup.