Using Terraform Remote State on AWS

Resources and Data Sources

Terraform is an Infrastructure as Code (IaC) tool by HashiCorp. It is used to manage the entire lifecycle of components in our infrastructure. Most popular terraform "providers" are AWS, Azure, GCP and Kubernetes but there are thousands of them. We can use a combination of these providers to build our infrastructure.

These providers work like libraries and they have "data" and "resource" entities which correspond to cloud services, configuration or more dynamic variables. We still need to have an idea on capabilities of these services to use them, but they are usually pretty compact. An example dynamodb resource could look like this

resource "aws_dynamodb_table" "users_table" {
  name             = "users"
  hash_key         = "userId"
  billing_mode     = "PAY_PER_REQUEST"
  stream_enabled   = true
  stream_view_type = "NEW_AND_OLD_IMAGES"

  attribute {
    name = "userId"
    type = "S"
  }

  replica {
    region_name = "us-east-2"
  }
}

We can also refer to an external shell script as a data source in terraform like below.

data "external" "my_external_script" {
  program = [
    "./${path.module}/custom-shell-script.sh",
    "first-argument",
    "${aws_dynamodb_table.users_table.name}"
  ]
}

Most of my experience with terraform is creating IAM policies, lambda functions, cloudwatch alarms/triggers, s3 buckets and dynamodb tables on AWS. We can create these resources declaratively and then update or delete them when needed. These changes on resources should be version controlled in git, so we can see changes on terraform files with each commit.

Plan output

One of the two most useful commands in terraform is plan which creates an execution plan in a human-readable format. Terraform creates an execution plan by comparing the existing remote resources and terraform files. It lists actions on resources which should be performed(create, update or delete) in order to synchronize our infrastructure with terraform configuration. The plan output can also be used by the apply command. So using them in order would look like this:

terraform plan -var-file=nonprod.tfvars -out tfplan
terraform apply tfplan

State Files

One thing between terraform configurations and real world resources is the terraform state The main purpose of state is to link remote resources to objects in terraform configuration. State file also keeps metadata on resources. Since deleting an object from the configuration should also result in deletion from the cloud provider, terraform should have some reference point on previously existing entries and their dependencies.

By default, state is stored locally in a file named terraform.tfstate This file is also a human-readable json. Yet, it is not intended to be modified by hand.

Starting from scratch on an empty aws account, creating and managing all resources with terraform from a single laptop would be straightforward, but it is usually not the case. We could have some existing resources in our account, we might need to modify our infrastructure during an incident or simply by mistake. This causes something called "state drift". It refers to a situation where terraform state and actual remote resources are not in sync during execution planning.

When we are already tracking updated resources in our state file, a terraform refresh command will update the local state before planning any changes but things get complicated for resources that we do not track. Creating some resources could result in 409 conflict errors because they already exist on remote but not on the state.

In this case, the import command can help us.

Importing Resources

If we have already existing remote resources which we want to start managing with terraform, we can import them to the state with a simple command. Terraform registry is a good place to find example import commands. Most resources have an import example at the end.

terraform import aws_s3_bucket.bucket bucket-name

terraform import aws_dynamodb_table.users_table users

terraform import aws_cloudwatch_log_stream.foo Yada:SampleLogStream1234

Remote State

A local state file is even more problematic when multiple people are working on the same infrastructure. State needs to be the up-to-date, and it should store all related resources. A built-in way to solve this issue is using remote backends. It is easy as defining a backend object. Instead of a local "terraform.tfstate" file, state is saved on a file in a remote bucket in this case. Terraform Cloud by Hashicorp also provides a hosted solution for remote backends.


terraform {
  backend "s3" {
    bucket = "my-bucket-to-store-state"
    key    = "path/to/tfstate-file"
    region = "us-east-1"
    dynamodb_table = "terraform-state-lock"
  }
}

Running terraform init will result in connecting to the s3 bucket to use the same tfstate-file for all operations. This is also useful when running terraform on a CI/CD pipeline because using the same remote file is much safer and cleaner than managing a local state file across multiple environments.

It is recommended to use dynamodb tables for state locking in AWS. Since working on the same infrastructure on parallel could cause unwanted issues, even with a remote state file.