Terraform Spot and Reserved Instances Management

Managing Spot and Reserved Instances with Terraform

Reading Time: 4 minutes

For those unfamiliar, Terraform is an Infrastructure as a code tool used for building, changing, and versioning infrastructure safely and efficiently. Basically an open-source, “cloud agnostic CloudFormation”, Terraform can manage existing and popular service providers as well as custom in-house solutions.
Terraform uses configuration files to describe the components needed to run a single application or your entire infrastructure. It works by generating an execution plan describing what it will do to reach the desired state and then executes it to build the described infrastructure.

With RIs and Spot Instances as essential elements for lowering your cloud costs, creating templates that properly manage these instances can be difficult. Especially if you’re running RIs across accounts or looking to use Spot Instances for production workloads.

Thankfully, you can use Terraform and Spotinst together to easily manage your RI’s and Spot Instances, making it both simple to take advantage of the savings they have to offer and 100% risk-free (place any production workload without a single point of failure on Spot). All this by just creating and managing a Spotinst Elastigroup resource from within your Terraform template.

How it’s done: managing Spotinst with Terraform

  1. Install and configure Spotinst Terraform provider
  2. Easily configure a few parameters to define your mixed strategy of Reserved and Spot Instances
  • orientation –  Select a strategy to optimize your cluster for; Supported Arguments are costOriented | availabilityOriented | balanced | equalAzDistribution
  • spot_percentage – The percentage of Spot instances that would spin up from the desired_capacity number.
  • ondemand_count – The number of minimum on-demand instances to launch in the cluster. All other instances will be spot instances. When this parameter is set thespot_percentage parameter is being ignored.
  • utilize_reserved_instances –  In a case of any available reserved instances, Elastigroup will utilize them first before purchasing Spot Instances.
  • fallback_to_ondemand – In a case of no Spot instances available, Elastigroup will launch on-demand instances instead.
  • revert_to_spot –  Whenever falling back to Spot – you can define a time window for Elastigroup to do so in order to support your SLA and working hours.


Creating your first Elastigroup (and running your first Spot instances)

To Initialize Terraform run terraform init as shown below:

$ terraform init

Initializing provider plugins...

Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

In your terminal, from the folder where you created the terraform template, run the terraform plan command:

$ terraform plan

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed. Cyan entries are data sources to be read.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

+ spotinst_aws_group.elastigroup
     availability_zone.#:                                       "2"
     availability_zone.1024954750.name:                         "us-east-1a"
     availability_zone.1024954750.subnet_id:                    "subnet-2682f90b"
     availability_zone.2410426630.name:                         "us-east-1c"
     availability_zone.2410426630.subnet_id:                    "subnet-09eb9a52"
     capacity.#:                                                "1"
     capacity.3157626798.maximum:                               "1"
     capacity.3157626798.minimum:                               "1"
     capacity.3157626798.target:                                "1"
     capacity.3157626798.unit:                                  "<computed>"
     description:                                               "Testing tf"
     instance_types.#:                                          "1"
     instance_types.2591038051.ondemand:                        "t2.small"
     instance_types.2591038051.spot.#:                          "3"
     instance_types.2591038051.spot.0:                          "m4.large"
     instance_types.2591038051.spot.1:                          "c3.large"
     instance_types.2591038051.spot.2:                          "c4.large"
     launch_specification.#:                                    "1"
     launch_specification.1996551530.ebs_optimized:             "<computed>"
     launch_specification.1996551530.health_check_grace_period: ""
     launch_specification.1996551530.health_check_type:         ""
     launch_specification.1996551530.iam_instance_profile:      "test-role"
     launch_specification.1996551530.iam_role:                  ""
     launch_specification.1996551530.image_id:                  "ami-32b0b649"
     launch_specification.1996551530.key_pair:                  "test2"
     launch_specification.1996551530.load_balancer_names.#:     "0"
     launch_specification.1996551530.monitoring:                "false"
     launch_specification.1996551530.security_group_ids.#:      "1"
     launch_specification.1996551530.security_group_ids.0:      "sg-848954f4"
     launch_specification.1996551530.shutdown_script:           ""
     launch_specification.1996551530.tenancy:                   ""
     launch_specification.1996551530.user_data:                 "fe9b62ddb7c5853c13912f8188b708a88f6f95e9"
     name:                                                      "test-tf"
     product:                                                   "Linux/UNIX"
     roll_config.#:                                             "1"
     roll_config.2383879618.batch_size_percentage:              "25"
     roll_config.2383879618.grace_period:                       "300"
     roll_config.2383879618.health_check_type:                  ""
     roll_config.2383879618.should_roll:                        "false"
     strategy.#:                                                "1"
     strategy.3674874163.availability_vs_cost:                  "<computed>"
     strategy.3674874163.draining_timeout:                      "180"
     strategy.3674874163.fallback_to_ondemand:                  "true"
     strategy.3674874163.ondemand_count:                        ""
     strategy.3674874163.risk:                                  "100"
     strategy.3674874163.spin_up_time:                          "<computed>"
     strategy.3674874163.utilize_reserved_instances:            "true"
     tags.%:                                                    "2"
     tags.CreatedBy:                                            "Spotinst"
     tags.Name:                                                 "test-tf"

Plan: 1 to add, 0 to change, 0 to destroy.

To create the Elastigroup run terraform apply command:

$ terraform apply

spotinst_aws_group.elastigroup: Creating...
 availability_zone.#:                                       "" => "2"
 availability_zone.1024954750.name:                         "" => "us-east-1a"
 availability_zone.1024954750.subnet_id:                    "" => "subnet-2682f90b"
 availability_zone.2410426630.name:                         "" => "us-east-1c"
 availability_zone.2410426630.subnet_id:                    "" => "subnet-09eb9a52"
 capacity.#:                                                "" => "1"
 capacity.3157626798.maximum:                               "" => "1"
 capacity.3157626798.minimum:                               "" => "1"
 capacity.3157626798.target:                                "" => "1"
 capacity.3157626798.unit:                                  "" => "<computed>"
 description:                                               "" => "Testing tf"
 instance_types.#:                                          "" => "1"
 instance_types.2591038051.ondemand:                        "" => "t2.small"
 instance_types.2591038051.spot.#:                          "" => "3"
 instance_types.2591038051.spot.0:                          "" => "m4.large"
 instance_types.2591038051.spot.1:                          "" => "c3.large"
 instance_types.2591038051.spot.2:                          "" => "c4.large"
 launch_specification.#:                                    "" => "1"
 launch_specification.1996551530.ebs_optimized:             "" => "<computed>"
 launch_specification.1996551530.health_check_grace_period: "" => ""
 launch_specification.1996551530.health_check_type:         "" => ""
 launch_specification.1996551530.iam_instance_profile:      "" => "test-role"
 launch_specification.1996551530.iam_role:                  "" => ""
 launch_specification.1996551530.image_id:                  "" => "ami-32b0b649"
 launch_specification.1996551530.key_pair:                  "" => "test2"
 launch_specification.1996551530.load_balancer_names.#:     "" => "0"
 launch_specification.1996551530.monitoring:                "" => "false"
 launch_specification.1996551530.security_group_ids.#:      "" => "1"
 launch_specification.1996551530.security_group_ids.0:      "" => "sg-848954f4"
 launch_specification.1996551530.shutdown_script:           "" => ""
 launch_specification.1996551530.tenancy:                   "" => ""
 launch_specification.1996551530.user_data:                 "" => "fe9b62ddb7c5853c13912f8188b708a88f6f95e9"
 name:                                                      "" => "test-tf"
 product:                                                   "" => "Linux/UNIX"
 roll_config.#:                                             "" => "1"
 roll_config.2383879618.batch_size_percentage:              "" => "25"
 roll_config.2383879618.grace_period:                       "" => "300"
 roll_config.2383879618.health_check_type:                  "" => ""
 roll_config.2383879618.should_roll:                        "" => "false"
 strategy.#:                                                "" => "1"
 strategy.3674874163.availability_vs_cost:                  "" => "<computed>"
 strategy.3674874163.draining_timeout:                      "" => "180"
 strategy.3674874163.fallback_to_ondemand:                  "" => "true"
 strategy.3674874163.ondemand_count:                        "" => ""
 strategy.3674874163.risk:                                  "" => "100"
 strategy.3674874163.spin_up_time:                          "" => "<computed>"
 strategy.3674874163.utilize_reserved_instances:            "" => "true"
 tags.%:                                                    "" => "2"
 tags.CreatedBy:                                            "" => "Spotinst"
 tags.Name:                                                 "" => "test-tf"

spotinst_aws_group.elastigroup: Creation complete after 8s (ID: sig-747be7a7)

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

You can verify the Created Elastigroup in your Spotinst Console.

Elastigroup seamlessly integrates with the reservations present in your AWS account, below is an Elastigroup which has both RI and Spot Instances:

Final Words

If you’re using Infrastructure as Code to provision servers, it makes your life far easier, but managing costs can still be a challenge.
Spot & Reserved Instances allow you to take advantage of a steep discount over On-Demand pricing, combining these two options within a single terraform plan helps you to take advantage of this costs savings while you sit back and enjoy 🙂