Run an EMR Cluster on Spot Instances in 5 Steps - Spot.io

Run an EMR Cluster on Spot Instances in 5 Steps

Introduction

 

In this tutorial, you will learn how to clone your Elastic MapReduce (EMR) clusters into an Elastigroup. AWS EMR provides a managed Big Data framework that enables you to easily add/remove cluster capacity to meet the necessary workloads for your application. EMR supports Hadoop, Apache Spark, and other popular distributed frameworks. Running your EMR clusters on Elastigroup provides you with the significant discounts that Spot instances offer while maintaining 100% availability.  

This tutorial focuses on cloning an existing EMR into Elastigroup. Elastigroup also enables you to wrap your existing cluster with Spot instances Task nodes. Head to our tutorial on Wrapping EMR Clusters to learn more.

Prerequisites:

  1. A verified Spot by NetApp Account.
  2. A running EMR Cluster

 

Step 1: Open The EMR Creation Wizard

 

Login to the Elastigroup Console (console.spotinst.com) and navigate to the Creation Wizard by clicking the Create button in the Elastigroups tab.

 

 

In the Creation Wizard select EMR:

 

 

Step 2: Add Elastigroup Description

 

Set the name and region of the Elastigroup. Click Next.

 

Step 3: Configure Strategy & Compute

 

  1. Under Strategy select Clone and provide an “Origin Cluster” for Elastigroup to Clone.
  2. For the Master, Core and Task nodes select the Instance Types, Lifecycle (Spot/On-Demand), Target and Minimum/Maximum number of instances. To ensure Spot availability select multiple Instance Types.
  3. To ensure widespread deployment select as many Availability Zones (AZ) as possible and select Subnets within each AZ.
  4. (Optional) Assign tags to the Elastigroup.
  5. (Optional) Advanced Settings:
    • Set a Root Volume Size (GB)  

      {Warning: decreasing root volume size is not recommended and might affect the proper launch of the instance group or the cluster}

    • Include EMR Steps 

      {Caution: This adds any steps configured in the original cluster to the clone}  

       

Step 4: Scaling Policies (optional)

 

  1. Elastigroup offers a wide variety of scaling options for EMR, both for Core and Task nodes. Assign the ones relevant to your environment.
  2. Click Next.

 

Step 5: Review and Create

 

The Creation Wizard prepares a JSON template to launch an Elastigroup with the EMR configuration. All that’s left to do is click Create!

 

You’ve now created an EMR on Elastigroup and are in the Elastigroup Manager view, where you can review, manage and monitor your running Elastigroup.

Congratulations!

 

You have now learned how to create an EMR cluster on Spot instances with Spot by NetApp, letting you:

  • Cut your costs by up to 80%, while maintaining high availability.
  • Run on spot instances with zero overhead, and no servers to manage – The Spot Elastigroup platform manages your infrastructure for you.

 

Next Steps

  • Create a Wrapped EMR Cluster on Elastigroup to run tasks nodes for your existing EMR cluster on Spot Instances.
  • Configure Elastigroup’s Scaling Policies for EMR Core and Task nodes.
  • Check out our API Docs here to learn how to clone your EMR into an Elastigroup using RESTful APIs.