Terraform

Created: 2018-11-16 09:52:03 -0800 Modified: 2020-03-23 16:08:53 -0700

Basics (reference, docs)

Terraform is a way to codify your infrastructure’s configuration. For example, on AWS, instead of having to manually create a VPC, two security groups, make inbound/outbound rules, etc., you can just write YAML files and have all of that be set up automatically.
If you find yourself needing to split your infrastructure up into different phases of setup that depend on one another but don’t wipe each other out completely, you may want to use Terragrunt (main site, link to my notes). This was helpful for me so that I didn’t need to setup the remote backend multiple times and copy terraform.tf all over the place. It also lets you share variables.
Module structure is described here. They recommend minimally having main.tf, outputs.tf, and variables.tf, even if they’re empty.
Terraform reads all “.tf” files in a given directory. For example, if you want to read in variables, you just point Terraform at your directory that contains all of your variable definitions.
When adding Terraform to your Git repo, this is a pretty good starting point for .gitignore.
If you need Terraform to zip content for you, there’s a provider for that (reference).
“concat” takes in two lists and forms a new list, but for some reason, you have to wrap that new list in the string notation:

subnet_ids = ["${concat(module.vpc.public_subnets, module.vpc.private_subnets)}"]

↑ This is 100% definitely correct as of v0.11.10

Terraform inherently knows how to order certain operations. E.g. if Resource B relies on Resource A, then the creation order will always be A → B, and the deletion order will always be B → A.
- There are implicit dependencies where Terraform can determine via variables that you may use, e.g. instance = ”${aws_instance.account_server.id}” (reference). This implies that whatever we’re setting relies on the account_server instance already existing.
- There are explicit dependencies where you can use “depends_on” to tell Terraform that some instance relies on another one due to application-level code.
- Terraform will try parallelizing as much as possible, and it uses the dependency tree to figure this out.
When you want all of the members of a list as input to another resource/data/whatever, the following are equivalents (with the second being much shorter):

azs = ["${data.aws_availability_zones.available.names[0]}", "${data.aws_availability_zones.available.names[1]}", "${data.aws_availability_zones.available.names[2]}"]
azs = ["${data.aws_availability_zones.available.names}"]

To access a particular property of a specific element in a list, you can use “element” (reference):

target_group_arn = "${element(aws_lb_target_group.alb-overseer-target-group.*.arn, count.index)}"

If you find yourself using ${type.name} frequently enough, you can assign an alias and just refer to the alias instead.
If there’s infrastructure that you manually created outside of Terraform and you want Terraform to be able to manage it, you use ”$ terraform import” (reference). I wrote a quick example of this in my notes here.
If you ever manually delete a piece of infrastructure from, say, AWS, and then rerun Terraform, it will detect that you deleted it and then try to recreate it.
When using a variable via ${foo}, it's just string interpolation, so you can put text afterward if you want, e.g. this for pulling the latest Docker image: "$ {aws_ecr_repository.ecr-verdaccio.repository_url}:latest“.
Providers, like AWS, sometimes do not have their documentation up-to-date. When this happens, you can look through the provider’s GitHub (e.g. AWS’s is here) at the corresponding .go file. For example, I knew that ecs_task_definition had more parameters allowed than what they documented, so I found the corresponding file on GitHub here and just did a ctrl+F for the parameter I wanted.
Provisioners can be used to bootstrap a server when it’s created (i.e. it is not supposed to be used on already-running instances). This is probably useful if you need to create a file or something for your application to run. You can also trigger Terraform to run a provisioner like Chef or Ansible. In general, if you’re using something like Docker, then you likely won’t need provisioners at all since Docker can wrap that entire process.
- Provisioners can also be configured to run when the resource is destroyed, e.g. to deregister themselves with a service registry so that the registry doesn’t have to rely on health checks.
Terraform only modifies what’s necessary to go from the current state to the desired state (this is sort of like how a virtual DOM works). This even works if, for example, you remove an instance completely; Terraform will know that it has to delete it.
- Terraform saves its state to a “terraform.tfstate” file. This contains information about what it’s installed, where it left off, etc. It’s highly recommended that you store this remotely rather than in your repository (reference). You can use something like AWS’s S3 to store this (although remember, bucket names are globally unique, so you can’t just copy/paste the S3 file or the “terraform” directive). If you do follow that reference link, it’s probably helpful to make a subfolder just for the Terraform S3/Dynamo configuration. Then, keep in mind that the “terraform” directive is for configuration after having set up S3/Dynamo, so put that into your main repo.
- For Google Cloud Platform, it’s not in their getting-started guide, but apparently there is a way to store the state: https://www.terraform.io/docs/backends/types/gcs.html
  - If that doesn’t work, you can use Terraform’s cloud (reference) which should be free for up to 5 users.
It’s a good idea to commit all non-secret Terraform resources (i.e. your configuration files) so that you have version history and so that other people can contribute to your infrastructure configuration.
Installation: download a zip from here and put it somewhere in your PATH (e.g. /usr/local/bin). Then do ”$ terraform -v” to make sure it installed with the right version. Here’s how I did this from the command line
- wget https://releases.hashicorp.com/terraform/0.11.10/terraform_0.11.10_linux_amd64.zip -O ~/terraform.zip
- sudo unzip ~/terraform.zip -d /usr/local/bin/
There’s a Visual Studio Code plug-in for linting/formatting here. For more information on how to configure this, see the section below. I like to turn CodeLens off: “terraform.codelens.enabled”: false
“terraform init” will download and install any provider-specific binaries.
Configurations generally contain various directives like “resource” or “data” (or even “terraform” itself):
- “resource” is for dynamically creating and managing things like EC2 instances
- “data” is for fetching values from resources already created outside of Terraform entirely. E.g. if you manually created a database in RDS and just want to use it from some other instance, you can use “data aws_db_instance” or “data aws_rds_cluster” to search for it.
- “resource” and “data” typically go hand-in-hand; resource is for creation, and data is for fetching. “module” can be used with “data” in a slightly less obvious way—a module is just a set of resources, so you would use the same data APIs that you’d use as though they were resources. For example, there’s an AWS VPC module, so if you want properties on the resulting VPC, you could use the data directive for “aws_vpc”.
- Referring between “resource” and “data” is easy (reference):

Using a resource from data:

data "aws_ecs_task_definition" "mongo" {
  # This points to the resource below to get the family
  task_definition = "${aws_ecs_task_definition.mongo.family}"
}

resource "aws_ecs_task_definition" "mongo" {
  family = "mongodb"

  # container_definitions goes here
}

Using data from a resource:

data "aws_db_instance" "database" {
  db_instance_identifier = "botland"
}

resource "aws_db_instance" "default" {
  instance_class = "${data.aws_db_instance.database.db_instance_class}"
}

“for loops” (they’re not really “for” loops) (reference) - specify “count” and ”${count.index}”

resource "aws_instance" "example" {
  count = 3
  ami = "ami-2d39803a"
  instance_type = "t2.micro"
  tags {
    Name = "example-${count.index}"
  }
}

Counts are 0-indexed.

There’s more information here on how to use variables with ${count.index} to get a set of non-numeric values, e.g. “first”/“second”/“third” (reference).
Using environment variables (reference)
- Environment variables are a special way of providing a value to an input variable, meaning if you don’t have it set, Terraform is going to ask you for it before being able to even run ”$ terraform apply”.
- Environment variables must start with “TFVAR”, e.g. TF_VAR_FOO.
- A TF file needs to manifest the input variable, e.g.

variable "FOO" {}

To use the variable: ”${var.FOO}”
You can (and likely will) have multiple providers (reference). Here’s a simple example right from the reference link:

provider "aws" {
}

provider "aws" {
  alias = "west"
  region = "us-west-2"
}

resource "aws_instance" "foo" {
  provider = "aws.west"
}

VSCode plug-in (reference)

You do need the Terraform binary on your computer for autoformatting to work since it just calls into ”$ terraform fmt”, and you can customize the path via the setting “terraform.path”, although you have to reload after doing that.
If you want to use IndentRainbow, you need to add a custom setting for Terraform

"indentRainbow.indentSetter": {
    "terraform": { "tabSize": 2, "insertSpaces": true },
  }

Importing a resource

Basics

For S3 buckets, the identifier you provide to “import” is their name (because they’re globally unique), not an ARN or ID.
Whenever Terraform owns a resource, ”$ terraform destroy” would get rid of that resource completely. You may want to stop this behavior with “prevent_destroy”, which is available to all resources (reference).
Importing a module is not immediately obvious even though they have a guide here on how it’s supposed to work. For example, I was using the “vpc” module (reference) and I wanted to import the VPC itself into Terraform. I needed to go to that reference page, click “Resources” to figure out what they named it, and then eventually plug that into the “import” command like this:

$ terraform import module.vpc.aws_vpc.this vpc-00fadeb0c362fda12

To “unimport” a resource, you use the state management commands (reference). For example:
- $ terraform state list
  - This gives you a list of resources that are tracked
- $ terraform state show rm aws_db_instance.default
  - This gives you information about a particular instance
- $ terraform state rm aws_db_instance.default
  - This will remove the instance. As of version 0.11.10, it doesn’t look like this ever fails, even if the instance hasn’t been imported.

Example

Scenario: I wanted to add a security group to an RDS instance that already existed in AWS, but I didn’t want Terraform to destroy the database when I ran ”$ terraform destroy”

Make a brand new folder
Make main.tf in that folder (see below for contents)
terraform init
terraform import aws_db_instance.default ARN_OF_DATABASE

main.tf from the steps above:

data "aws_db_instance" "database" {
  db_instance_identifier = "botland"
}

resource "aws_db_instance" "default" {
  instance_class = "${data.aws_db_instance.database.db_instance_class}"

  # This isn't necessary, but RDS adds this tag by default and I just didn't want to remove it
  tags {
    workload-type = "other"
  }

  lifecycle {
    prevent_destroy = true
  }
}

Validating Terraform

“terraform validate” is the command to validate all of the Terraform files in the current directory. However, if there are any input variables, then those need to be set. I found that it’s easiest to perform these steps:

Run “terraform validate” and have it fail and spew a ton of variable-related errors
Copy/paste all of those into a text editor, e.g.

DATABASE_HOST

MATCHMAKER_REST_PORT

DATABASE_USER

DATABASE_NAME

Use multiple cursors to change them into this format

export TF_VAR_DATABASE_HOST=1234

export TF_VAR_MATCHMAKER_REST_PORT=1234

export TF_VAR_DATABASE_USER=1234

export TF_VAR_DATABASE_NAME=1234

To explain those changes
1. “export” is just to get the variable into your current environment
2. “TFVAR” is needed when specifying the variables via the environment so that Terraform knows that those variables are for Terraform
3. Numbers are specified for everything since a number is also a string, but a string is not always a number. That way, you don’t need to cherry-pick which variables need to be set to numbers, so you’re less likely to see errors on subsequent runs of “terraform validate”.

Variables

Input variables are used to configure your infrastructure just like command-line arguments would work (reference). Sample definition and usage:

variable "secret_key" {}
variable "region" {
  default = "us-east-1"
}

provider "aws" {
  secret_key = "${var.secret_key}"
  region = "${var.region}"
}

If you give an input variable an empty value, you’ll be prompted for it when you run ”$ terraform apply”.
You can provide default values to input variables.
You can specify any input variables directly via the command line “-var ‘foo=bar’“.
You can specify lists and maps if you need more control than just strings and numbers.
Output variables are for highlighting pieces of information at the end of a ”$ terraform apply”. E.g.

output "ip" {
  value = "${aws_eip.ip.public_ip}"
}

(↑ note: you can put this in any .tf file)

You can even use a module’s output variables as your own (reference):

output "consul_server_asg_name" {
  value = "${module.consul.asg_name_servers}"
}

Any time you run “terraform apply”, you’ll see the outputs even if Terraform didn’t need to change anything, and even if it failed (although it does need to at least succeed to the point where your variable can be set, e.g. if you can’t create an EC2 instance due to a naming clash and try to get the IP address as output, it’ll fail completely).

Modules (reference)

When using a module, the string that comes after the word “module” must be unique, so that means it doesn’t have to match the original name of the module. It’s the source property that tells the module what code to use. E.g.:

module "I-can-name-this-whatever-I-want-as-long-as-it-is-unique" {
  # The source tells it to use this module
  source = "terraform-aws-modules/security-group/aws"
  version = "2.9.0"
}

There’s a module registry that contains commonly shared pieces of functionality: https://registry.terraform.io/
- To use the registry, make sure you look at the various tabs, especially Inputs and Outputs.
- You don’t have to use the registry; modules can be installed from a variety of sources (e.g. Git, HTTP, or local files).
To use a module from a local directory, just specify the source as a path to a directory (not a file):

module "some_example" {
    source = "./some_example/"
}

# ./some_example/main.tf
provider "aws" {
  region = "us-west-2"
}

resource "aws_instance" "example" {
  ami = "ami-0afae182eed9d2b46"
  instance_type = "t2.micro"
}

By doing ”$ terraform apply”, you’ll create an EC2 instance with the specified AMI.

I think you should always specify the version of a particular module that you’re going to use.
”$ terraform init” will install any modules needed onto your machine.
- If you don’t want to rely on a bunch of external sites (e.g. GitHub, the Terraform registry, etc.), then you can check out all of the modules and include them as git submodules and then use them from that local git repo.
Modules can be nested. Even on their example page, they show “module.consul.module.consul_clients” being used for something, which is the “consul_clients” submodule of the “consul” module.

AWS Basics (reference)

Terraform can automatically find ~/.aws for credentials, but there are also environment variables for these:

$ export AWS_ACCESS_KEY_ID=“anaccesskey”

$ export AWS_SECRET_ACCESS_KEY=“asecretkey”

$ export AWS_DEFAULT_REGION=“us-west-2”

↑ Note: it is not TF_VAR_AWS_BLAH
Terraform cannot find ~/.aws/config on its own, so you do either need AWS_DEFAULT_REGION set or to provide region = “us-foo-1” in your “provider” block.
For service-linked roles (which are roles predefined by AWS (reference)), you need to use aws_iam_service_linked_role. For example, to create an ECS service, you need the appropriate service-linked role.
Each VPC has a default route table that Terraform can’t create, but it can manage it (reference). I didn’t figure out whether you should ever use aws_route_table over aws_default_route_table, but using aws_default_route_table will definitely work for a VPC as long as you’re okay with it wiping out any existing routes when Terraform adopts it.
When making an IAM role, awsiam_role lets you specify a policy. This policy is _only for the trust relationship, not for the actual permissions. So, for example:

resource "aws_iam_role" "ecsTaskExecutionRole" {
  name = "ecsTaskExecutionRole"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
}

When you want to attach policies, you can attach a managed policy via an ARN with aws_iam_role_policy_attachment or make your own policy with aws_iam_role_policy.

To get the current region without having to hard-code it, look at this snippet:

data "aws_region" "current" {}

Then, just use ”${data.aws_region.current.name}” to refer to the region name.

To get availability zones without having to hard-code them, look at this snippet (reference):

data "aws_availability_zones" "available" {}

module "vpc" {
  azs = ["${data.aws_availability_zones.available.names[0]}", "${data.aws_availability_zones.available.names[1]}", "${data.aws_availability_zones.available.names[2]}"]
}

Command-line (reference)

” $terraform apply" - see what changes will be made before having to manually accept them (which is just "$ $t err a f or ma ppl y " - see w ha t c han g es w i ll b e ma d e b e f ore ha v in g t o man u a ll y a cce ptt h e m (w hi c hi s j u s t "$ terraform plan”, then push them to your various cloud platforms.
- Note: ”-/+” next to a resource means it will get destroyed and recreated rather than modifying it in-place (reference).
”$ terraform show” - show human-readable output of the current state

AWS usage (ref to “basics” notes)

CloudWatch alarms

This is more AWS-specific than Terraform-specific, but I struggled with it while using Terraform, so I’m writing it in this note.

CloudWatch alarms can be scoped down to individual resources via “dimensions”. The exact names/values of each dimension are specified in the AWS documentation. For example, when I wanted to make a CloudWatch alarm for a specific SQS queue, I looked at the Terraform docs for CloudWatch alarms, but they pointed me to a summary table on the AWS docs, which pointed me to the SQS-specific docs, which finally got me to the “available CloudWatch metrics page”, and aaaall the way at the bottom, I saw this:

The only dimension that Amazon SQS sends to CloudWatch is QueueName. This means that all available statistics are filtered by QueueName.

Security groups that depend on one another

Suppose you have LoadBalancerX that allows outbound traffic to ServerY, so ServerY wants to allow inbound traffic from LoadBalancerX. This is a “circular” rule, and it’s easy to accomplish with Terraform:

Create your security groups first and have them be completely empty. They should be completely empty due to the advisory at the top of this page:

**NOTE on Security Groups and Security Group Rules:**Terraform currently provides both a standalone Security Group Rule resource (a singleingressoregressrule), and aSecurity Group resourcewithingressandegressrules defined in-line. At this time you cannot use a Security Group with in-line rules in conjunction with any Security Group Rule resources. Doing so will cause a conflict of rule settings and will overwrite rules.

Add your security group rules afterward.

module "vpc" {
  #...
}
module "sg-server" {
  source = "terraform-aws-modules/security-group/aws"
  version = "2.9.0"

  name = "example-server"
  vpc_id = "${module.vpc.vpc_id}"
}

module "sg-loadbalancer" {
  source = "terraform-aws-modules/security-group/aws"
  version = "2.9.0"

  name = "example-public-alb"
  vpc_id = "${module.vpc.vpc_id}"
}

resource "aws_security_group_rule" "allow-load-balancer-to-server" {
  type = "egress"
  from_port = 4873
  to_port = 4873
  protocol = "tcp"
  source_security_group_id = "${module.sg-server.this_security_group_id}"

  security_group_id = "${module.sg-loadbalancer.this_security_group_id}"
}

Just keep in mind that the example above only sets up the egress rule, not the ingress rule.

Passing values to CI (CircleCI + AWS)

I’m using CircleCI and AWS, so this is somewhat specific to those technologies. For reference, the scenario that I have is something like this:

I run ”$ terraform apply” from Workflow A
I run commands that require the infrastructure that “apply” creates from Workflow B
Thus, I can’t just persist a few variables to a workspace because they’re separate workflows entirely

Here are the solutions that I thought of:

Parameterize the names of everything in CircleCI environment variables. When creating with Terraform, use those names. When needed in CircleCI, use the AWS CLI to look up the values by those names. This won’t be possible with dynamically created values, e.g. an ARN
Similar to parameterizing names, you can add tags to the resources that you create and then look them up by tags.
Use AWS SSM Parameter Store to save the key from Terraform (aws_ssm_parameter) and read it from CircleCI using the AWS CLI.
1. Alternatively, you could store in S3.
Install Terraform from Workflow B and use it to query values using output variables (and probably ”$ terraform output”). This involves installing Terraform and setting up the remote back-end (that points to S3) to be able to do this.
Use the CircleCI API from Workflow A with local-exec to edit an existing CircleCI environment variable that you read from Workflow B (reference)
Use terraform_remote_state to bypass the issue altogether. This lets you use something like ${data.terraform_remote_state.some_other_terraform_folder.output_from_that_folder} between folders.

In my experience, solution #1 is the easiest to set up since:

You have access to CircleCI’s environment variables from Terraform as long as you name them “TFVAR*” and add them to variables.tf.
Workflow B is almost always using the AWS CLI anyway (or else it can’t really interact with AWS).

I used solution #4 when I already had a CircleCI job with Terraform and needed an ARN out of Amazon.

Troubleshooting

AWS-specific problem - “the target group with targetGroupArn … does not have an associated load balancer” or “a listener already exists on this port for this load balancer” (reference)

I had a very specific scenario that I wanted to set up. In short, I wanted there to be two services online at any given time (as a primary/fail-over setup), but I only wanted the load balancer to point at the primary one. This meant that I needed two target groups, but when trying to set them up with Terraform, I would run into one of two errors:

“the target group with targetGroupArn … does not have an associated load balancer” - this was if I tried just making a target group with no listener
“a listener already exists on this port for this load balancer” - this was if I tried adding a listener (with the same port) for each of the target groups

The reference link has some workarounds, but I came up with one that I like a little bit better—you make a second listener for a port that you never intend to use (e.g. 13531). The container that you’re making the listener for doesn’t even need to expose that port, nor do you have to take in traffic on that port.

Note that health checks are just going to constantly fail for this since the container doesn’t expose that port.

AWS-specific problem - Resource FOO not found for variable when working with task definitions

“terraform validate” will only validate the structure of your HCL; if you have a JSON blob, it won’t even be checked to see if it’s valid JSON let alone if AWS would be able to understand it. Here’s a snippet of a problem that I had:

resource "aws_ecs_task_definition" "game-server-task" {
  container_definitions = <<DEFINITION
[
  {
    "name": "botland-game-server",
    "image": "${data.terraform_remote_state.app_foundation.game_server_ecr_repo_url}:${var.GAME_SERVER_DOCKER_IMAGE_TAG}",
    "portMappings": [
      {
        "containerPort": ${var.GAME_SERVER_REST_PORT},
        "hostPort": ${var.GAME_SERVER_REST_PORT},
        "protocol": "tcp"
      }
    ],
  }
]
DEFINITION
}

Originally, I had specified containerPort and hostPort as strings, not as integers. Terraform would validate, and the “apply” command would run, but then I got an error in a different resource saying that game-server-task could not be located.

To fix this, I just had to remove the quotation marks in this particular case. However, the general solution when “terraform validate” works but “terraform apply” doesn’t is to look at the sections of your HCL that cannot be validated.

The problem is somewhat difficult to see sometimes because normal HCL doesn’t allow typing variables outside of quotation marks (even if they represent numbers):

resource "foo" "bar" {
  valid_example = "${var.MATCHMAKER_REST_PORT}"
  invalid_example = ${var.MATCHMAKER_REST_PORT}
}

No dynamic for_each

I ran into a problem where I was trying to make a load balancer with several listeners that would have custom rules. However, the module for making listeners in an ALB does not let you specify the rules, so I would have needed to create the load balancer with the array of listeners, then later, when adding the rules, make sure the indices of the rules matched the indices of the listeners. This would make updating very brittle since I would need to make sure these indices are always in sync.

Someone else had a similar issue here and it’s apparently all going to get fixed in version 0.12 (reference).

Failure configuring LB attributes: InvalidConfigurationRequest: Access Denied for bucket: bucketname. Please check S3bucket permission (reference)

In short, this is apparently just tedious with Terraform. You have to set up an S3 policy that involves a principal ID that changes based on the region that you’re in. Thankfully, Terraform provides an easy way to get that principal ID (reference), and the StackOverflow post tells you exactly how to put all of this together:

data "aws_elb_service_account" "main" {}

resource "aws_s3_bucket" "alb-logs" {
  bucket = "botland-alb-access-logs"
  acl = "private"

  tags {
    CreatedBy = "Terraform"
  }

  policy = <<POLICY
{
  "Id": "Policy",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:PutObject"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::botland-alb-access-logs/AWSLogs/*",
      "Principal": {
        "AWS": [
          "${data.aws_elb_service_account.main.arn}"
        ]
      }
    }
  ]
}
POLICY
}

xDeadBringerx: @Adam13531 If you do not like that HEREDOC syntax with JSON for the policy you can use https://www.terraform.io/docs/providers/aws/d/iam_policy_document.html

InvalidParameterException: Unable to assume role and validate the specified targetGroupArn.

Full error text:

InvalidParameterException: Unable to assume role and validate the specified targetGroupArn. Please verify that the ECS service role being passed has the proper permissions.

status code: 400, request id: 22fec17c-ec2b-11e8-b467-cff99a258adc “verdaccio”

This happened when I was specifying a load balancer ARN instead of a target group ARN (which is a rookie mistake). I just needed this:

resource "aws_ecs_service" "verdaccio" {
  load_balancer {
    target_group_arn = "${aws_lb_target_group.alb-verdaccio-target-group.arn}"
  }
}

error deleting S3 Bucket (botland-alb-access-logs): BucketNotEmpty (reference)

Terraform has a “force_destroy” option for aws_s3_bucket (reference), but I think you either need to have specified it when the bucket was originally created, or you need to modify the tfstate to figure out how to allow termination from there (it’s probably “force_destroy”: “true” (with quotes around “true”) (reference)), or just go to the AWS console and manually delete the bucket yourself.

InvalidParameterException: Unable to assume the service linked role

As far as I can tell, this is a race condition with IAM because IAM is not strongly consistent and takes some time to propagate. There are many workarounds:

Add in a “sleep 10” to the creation of the role (reference)
Manually create the service-linked role outside of Terraform entirely; it seems like this only happens the first time Terraform runs. Just be careful because if you have a service-linked role already created and you tell Terraform to make the same one, Terraform will fail.
Use a depends_on chain as mentioned by warpaint:
- [11:40] warpaint: There’s some shenanigans you can pull with depends_on - have your, say, ALB depend on the service-linked role, then have your esc_cluster depend on the ALB. ALBs take a few mins to create which should be enough time.

aws_service_discovery_service.verdaccio-service-discovery-service: ServiceAlreadyExists: Service already exists.

This isn’t really a Terraform error so much as it is an AWS error that’s just difficult to figure out. As far as I can tell, you can’t actually see service discovery services in the AWS console, so you have to use the CLI.

$ aws servicediscovery list-services

$ aws servicediscovery delete-service —id SRV-blah-from-last-command

The Service Discovery instance could not be registered. (reference)

This was likely due to the steps below (, but I made a lot of changes, so I’m not positive:

Set a health_check_custom_config. This is because that matches what I did in step #1 of this reference.

health_check_custom_config {
        failure_threshold = 1
    }

Make sure “terraform apply” doesn’t keep trying to change an ID to “srv-foo”, where “foo” is a service that doesn’t actually exist from “aws servicediscovery list-services”.
1. If this happens, do “terraform destroy” and recreate everything.

Error acquiring the state lock: ConditionalCheckFailedException: The conditional request failed (reference)

I don’t know how this happened exactly, but I did this from my local machine (with the tfstate file pointing to a remote back-end as mentioned here): terraform force-unlock UUID_OF_LOCK

You get the UUID from the error message where you originally saw ConditionalCheckFailedException.

I had this happen on CircleCI and I think I do know the cause:

I started a workflow, but I’d forgotten a variable, so the container was hanging on “Type the value of FOO:“.
I canceled the job, which means I think Terraform didn’t have a chance to release the lock

Can’t import aws_db_instance.default, would collide with an existing resource

This likely means that you already imported the resource. The solution was to do “terraform state rm <instance ID as shown in <terraform state list>>” before doing “terraform import”. I believe I can run the “rm” command even if it doesn’t exist without worrying about an error.

aws_db_instance.default: Error modifying DB Instance arn:aws:rds:us-west-2:212785478310:db:botland: InvalidParameterValue: The parameter DBInstanceIdentifier is not a valid identifier. Identifiers must begin with a letter; must contain only ASCII letters, digits, and hyphens; and must not end with a hyphen or contain two consecutive hyphens.

This is a bug (reference).

Aurora Serverless doesn’t support DB subnet groups with subnets in the same Availability Zone. Choose a DB subnet group with subnets in different Availability Zones.

This is just a general AWS problem. You can’t have two subnets specified that are in the same availability zone. Also, according to this:

You can’t give an Aurora Serverless DB cluster a public IP address. You can access an Aurora Serverless DB cluster only from within a virtual private cloud (VPC) based on the Amazon VPC service.

This means that you should probably only be using private subnets, and Aurora specifically needs to run in multiple availability zones.

Terraform is changing the desired_count when updating a service

Suppose there’s a scenario where you want to update a service without changing the desired_count. aws_ecs_service technically says desired_count is optional, but it defaults to 0 if you don’t specify it, which means your service will be drained of all tasks.

To work around this, you can use ignore_changes; it lets you ignore the diff for a particular set of attributes:

resource "aws_ecs_service" "verdaccio" {
  name = "verdaccio"
  desired_count = 0

  # Allow external changes without Terraform plan difference
  lifecycle {
    ignore_changes = ["desired_count"]
  }
}

If your service would only change in desiredcount, then the “ignore_changes” will result in nothing actually changing. However, I’m not sure what happens if you had a change in both desired_count _and something else; it may actually set your desired_count back to 0.

SNS - “You provided a certificate of type SANDBOX”

When you have a sandbox certificate but a production application, you simply need to change your platform from APNS to APNS_SANDBOX:

resource "aws_sns_platform_application" "apns_application" {
  name = "apns_application_bot_land"
  platform = "APNS_SANDBOX"

  platform_credential = "${var.APNS_PRIVATE_KEY}"
  platform_principal = "${var.APNS_CERTIFICATE}"
}

Full error:

aws_sns_platform_application.apns_application: Error creating SNS platform application: InvalidParameter: Invalid parameter: Attributes Reason: You provided a certificate of type SANDBOX, which cannot be used to create an application of type iOS Production. Please select an application of type SANDBOX or provide a certificate of type iOS Production

Terraform source code on GitHub showing the relevant file

Full error:

aws_sns_platform_application.apns_application: Error creating SNS platform application: InvalidParameter: Invalid parameter: Attributes Reason: Platform credentials are invalid

This happened to me when I was extracting the encrypted private key rather than the private key. I used the AWS Console to manually extract the certificate and key and compared to what I was getting and saw that they were different. This SO post ended up clarifying things for me (it had me run this command: “openssl pkcs12 -in new-ios-app.pfx -nodes -clcerts”).

“terraform output” gives errors

I got this error when trying to just run “terraform output”:

Error configuring the backend “s3”: SignatureDoesNotMatch: Signature expired: 20190312T033800Z is now earlier than 20190312T153349Z (20190312T154849Z - 15 min.)

status code: 403, request id: 53cddf4a-44de-11e9-ab7b-1bd060f64677

I actually use Terragrunt, but Terragrunt also wouldn’t work until I ran it in the root directory.

Terraform

Basics (reference, docs)

VSCode plug-in (reference)

Importing a resource

Validating Terraform

Variables

Modules (reference)

AWS Basics (reference)

Command-line (reference)

AWS usage (ref to “basics” notes)

CloudWatch alarms

Security groups that depend on one another

Passing values to CI (CircleCI + AWS)

Troubleshooting

AWS-specific problem - “the target group with targetGroupArn … does not have an associated load balancer” or “a listener already exists on this port for this load balancer” (reference)

AWS-specific problem - Resource FOO not found for variable when working with task definitions

No dynamic for_each

Failure configuring LB attributes: InvalidConfigurationRequest: Access Denied for bucket: bucketname. Please check S3bucket permission (reference)

error deleting S3 Bucket (botland-alb-access-logs): BucketNotEmpty (reference)

InvalidParameterException: Unable to assume the service linked role

aws_service_discovery_service.verdaccio-service-discovery-service: ServiceAlreadyExists: Service already exists.

The Service Discovery instance could not be registered. (reference)

Error acquiring the state lock: ConditionalCheckFailedException: The conditional request failed (reference)

Can’t import aws_db_instance.default, would collide with an existing resource

Aurora Serverless doesn’t support DB subnet groups with subnets in the same Availability Zone. Choose a DB subnet group with subnets in different Availability Zones.

Terraform is changing the desired_count when updating a service

SNS - “You provided a certificate of type SANDBOX”

SNS “Platform credentials are invalid”

“terraform output” gives errors