IaC Multi-Cloud

DRY Multi-Environment Infrastructure with Terragrunt: Stacks, Dependencies, and Promotion

If you have ever copied a backend.tf into a fifth environment directory and changed one bucket key by hand, you already know the failure mode Terragrunt exists to fix. This article walks through a real multi-account, multi-environment layout: generating backend and provider config, sharing inputs cleanly, wiring dependencies between modules, and promoting a change from dev to prod without copy-paste.

What Terragrunt actually solves

Plain Terraform forces a choice. Either you use workspaces (one state, branching logic on terraform.workspace, no per-environment provider config) or you copy a root module per environment. The copy approach is honest about isolation but produces duplication: every environment repeats a backend block, a provider block, and a wall of variable defaults that drift apart over time.

Terragrunt is a thin wrapper around the terraform/tofu binary. It does not replace Terraform; it orchestrates it. Each leaf directory holds a terragrunt.hcl that points at a module, declares inputs, and lets Terragrunt generate the boilerplate at apply time. The result: backend keys, provider versions, and account wiring live in exactly one place.

Terragrunt works identically against OpenTofu. Set TERRAGRUNT_TFPATH=tofu (or terraform_binary = "tofu" in your config) and every command below is unchanged.

1. Repository layout: live vs. modules

Separate the definition of infrastructure (reusable modules) from the instantiation of it (the live tree). Modules can live in this repo or a versioned registry; the live tree is environment-specific and changes constantly.

infra/
  modules/                      # reusable, versioned Terraform modules
    network/
    eks/
    rds/
  live/
    root.hcl                    # account-agnostic shared config
    dev/
      account.hcl               # account_id, account_name
      us-east-1/
        region.hcl              # aws_region
        network/
          terragrunt.hcl
        eks/
          terragrunt.hcl
        rds/
          terragrunt.hcl
    prod/
      account.hcl
      us-east-1/
        region.hcl
        network/
          terragrunt.hcl
        eks/
          terragrunt.hcl
        rds/
          terragrunt.hcl

The hierarchy environment / region / component is the load-bearing convention. The directory path is the identity of a unit of infrastructure, and we will derive the state key directly from it so two components can never collide.

2. Generating backend config with remote_state

Put the backend definition in live/root.hcl once. The key is derived from the relative path between the root config and the leaf, so each component lands at a unique, predictable path in state.

# live/root.hcl
remote_state {
  backend = "s3"

  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }

  config = {
    bucket = "acme-tfstate-${local.account_vars.locals.account_id}"
    key    = "${path_relative_to_include()}/terraform.tfstate"
    region = local.region_vars.locals.aws_region

    encrypt        = true
    use_lockfile   = true
  }
}

locals {
  account_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
  region_vars  = read_terragrunt_config(find_in_parent_folders("region.hcl"))
}

A few details that matter for correctness:

For an Azure backend the shape is the same, only the config differs:

remote_state {
  backend = "azurerm"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    resource_group_name  = "rg-tfstate"
    storage_account_name = "acmetfstate${local.account_vars.locals.account_id}"
    container_name       = "tfstate"
    key                  = "${path_relative_to_include()}/terraform.tfstate"
  }
}

3. Generating provider config with generate blocks

remote_state only handles the backend. For the provider, use a generate block so every component gets a consistently configured, account-correct provider without repeating it.

# live/root.hcl  (continued)
generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "aws" {
  region = "${local.region_vars.locals.aws_region}"

  assume_role {
    role_arn = "arn:aws:iam::${local.account_vars.locals.account_id}:role/terraform-exec"
  }

  default_tags {
    tags = {
      Environment = "${local.account_vars.locals.account_name}"
      ManagedBy   = "terragrunt"
    }
  }
}
EOF
}

This is where multi-account isolation becomes real: each environment’s account.hcl carries a different account_id, so the generated provider assumes a role in the right account. There is no shared credential blob and no chance of pointing a dev apply at prod — the role ARN is computed from the directory you are standing in.

# live/prod/account.hcl
locals {
  account_name = "prod"
  account_id   = "222222222222"
}
# live/dev/account.hcl
locals {
  account_name = "dev"
  account_id   = "111111111111"
}

4. Sharing inputs with include and read_terragrunt_config

Each leaf terragrunt.hcl pulls in the root via an include block. include is what activates the generated backend and provider; without it the leaf would have neither.

# live/dev/us-east-1/network/terragrunt.hcl
include "root" {
  path = find_in_parent_folders("root.hcl")
}

terraform {
  source = "${dirname(find_in_parent_folders("root.hcl"))}/../modules/network"
  # In production, prefer a pinned, versioned source:
  # source = "git::git@github.com:acme/infra-modules.git//network?ref=v1.4.0"
}

locals {
  env_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
}

inputs = {
  vpc_cidr            = "10.10.0.0/16"
  environment         = local.env_vars.locals.account_name
  enable_nat_gateway  = true
  single_nat_gateway  = true   # one NAT in dev to save money
}

read_terragrunt_config parses another HCL file and exposes its locals, so common facts (account name, region, org-wide CIDR plan) are defined once and read everywhere. Inputs declared in the leaf are merged with anything from the included config, with the leaf winning — that is exactly the override behavior you want for per-environment tuning.

Pin your module source to a tag or commit in anything beyond a sandbox. An unpinned source means a plan today and an apply tomorrow can run different module code. Versioned sources are also what make promotion (Step 7) a deliberate, reviewable act.

5. Wiring dependencies between modules

EKS needs the VPC’s subnet IDs. RDS needs them too. Encode that with a dependency block, which reads another unit’s outputs and exposes them as dependency.<name>.outputs.<key>.

# live/dev/us-east-1/eks/terragrunt.hcl
include "root" {
  path = find_in_parent_folders("root.hcl")
}

terraform {
  source = "${dirname(find_in_parent_folders("root.hcl"))}/../modules/eks"
}

dependency "network" {
  config_path = "../network"

  mock_outputs = {
    vpc_id             = "vpc-00000000000000000"
    private_subnet_ids = ["subnet-1111", "subnet-2222", "subnet-3333"]
  }
  mock_outputs_allowed_terraform_commands = ["validate", "plan", "init"]
}

inputs = {
  cluster_name       = "dev-platform"
  kubernetes_version = "1.31"
  vpc_id             = dependency.network.outputs.vpc_id
  subnet_ids         = dependency.network.outputs.private_subnet_ids
}

The mock_outputs block is the part people get wrong. When you plan the EKS unit and the network has not been applied yet, its real outputs do not exist — the plan would fail trying to read them. Mock values let plan, validate, and init proceed with placeholders. The mock_outputs_allowed_terraform_commands allowlist is critical: it ensures apply and destroy are never fed fake subnet IDs. An apply will only run once the real outputs are available.

Because RDS depends on ../network the same way, Terragrunt now knows the order: network first, then EKS and RDS. You did not write that order anywhere; it is inferred from the dependency graph.

6. run-all for whole-environment plans and applies

run-all walks every terragrunt.hcl under the current directory, builds the dependency DAG, and runs your command in topological order — applying dependencies before dependents and parallelizing independent units.

# Stand up the entire dev/us-east-1 stack in dependency order
cd infra/live/dev/us-east-1
terragrunt run-all plan
terragrunt run-all apply

Two operational notes:

A minimal CI stage (GitHub Actions) for a single environment:

name: terragrunt-plan
on: [pull_request]

jobs:
  plan:
    runs-on: ubuntu-latest
    permissions:
      id-token: write          # OIDC, no long-lived keys
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::111111111111:role/ci-terraform
          aws-region: us-east-1
      - uses: gruntwork-io/terragrunt-action@v2
        with:
          tg_command: "run-all plan"
          tg_dir: "infra/live/dev/us-east-1"

7. A promotion workflow: dev to staging to prod

Promotion is the payoff. The module code is identical across environments; only the inputs and account wiring differ, and those live in small, reviewable files. A change flows like this:

  1. Edit the module in modules/ and tag a release, e.g. v1.5.0.
  2. Bump dev by changing the ref= in live/dev/.../terragrunt.hcl, open a PR, review the run-all plan, merge, apply.
  3. Bake, then bump staging to v1.5.0. Same diff, different directory, separate PR.
  4. Promote to prod by changing the same ref= under live/prod/.... The prod PR diff is one line, which is precisely what you want a reviewer to see.

Environment-specific behavior stays in inputs, not code. The prod overrides are explicit and isolated:

# live/prod/us-east-1/network/terragrunt.hcl  (inputs only)
inputs = {
  vpc_cidr           = "10.30.0.0/16"
  single_nat_gateway = false   # one NAT per AZ in prod for resilience
}

# live/prod/us-east-1/eks/terragrunt.hcl  (inputs only)
inputs = {
  cluster_name = "prod-platform"
  node_min     = 3
  node_max     = 20
}

Because the backend key is derived from the path and the provider role is derived from account.hcl, the prod state is in the prod account’s bucket and the apply assumes the prod role — guaranteed by structure, not by a runbook step someone might skip.

Keep the module version pinned per environment rather than floating all environments off main. The whole point of promotion is that prod runs code that already survived dev and staging. If every environment tracks main, you have re-invented “deploy straight to prod.”

Verify

Confirm the wiring does what you think before trusting it.

# 1. Inspect the generated files for one unit (do not commit these)
cd infra/live/dev/us-east-1/eks
terragrunt init
cat backend.tf provider.tf      # exception to the usual no-cat rule: confirm generation

# 2. Confirm the dependency graph and apply order
cd infra/live/dev/us-east-1
terragrunt graph-dependencies    # emits Graphviz DOT; pipe to `dot -Tpng` if desired

# 3. Validate every unit without touching cloud state
terragrunt run-all validate

# 4. Confirm state isolation: keys must differ per component
aws s3 ls s3://acme-tfstate-111111111111/dev/us-east-1/ --recursive

You are looking for three things: each unit generated its own backend.tf/provider.tf, the dependency graph shows network upstream of eks and rds, and the S3 listing shows distinct keys like dev/us-east-1/network/terraform.tfstate and dev/us-east-1/eks/terraform.tfstate.

Checklist

When Terragrunt is the wrong tool

Terragrunt earns its keep when you have many near-identical stacks across accounts and regions. It is overhead you should decline when:

The exit ramp matters too. Because Terragrunt only generates standard Terraform files and calls the normal binary, leaving is tractable: commit the generated backend.tf/provider.tf, inline the inputs as .tfvars, and you are back to vanilla Terraform with state intact. Adopt it for the duplication it removes, not because it is fashionable — and keep the generated output boring enough that walking away is always an option.

Next steps

Add a _envcommon/ layer for component config shared across all environments (the EKS inputs that never change between dev and prod), include it alongside root.hcl, and let each environment override only what truly differs. That collapses the last of the duplication and makes the promotion diff smaller still.

TerragruntTerraformDRYMulti-AccountCI

Comments

Keep Reading