If you have ever copied a backend.tf into a fifth environment directory and changed one bucket key by hand, you already know the failure mode Terragrunt exists to fix. This article walks through a real multi-account, multi-environment layout: generating backend and provider config, sharing inputs cleanly, wiring dependencies between modules, and promoting a change from dev to prod without copy-paste.
What Terragrunt actually solves
Plain Terraform forces a choice. Either you use workspaces (one state, branching logic on terraform.workspace, no per-environment provider config) or you copy a root module per environment. The copy approach is honest about isolation but produces duplication: every environment repeats a backend block, a provider block, and a wall of variable defaults that drift apart over time.
Terragrunt is a thin wrapper around the terraform/tofu binary. It does not replace Terraform; it orchestrates it. Each leaf directory holds a terragrunt.hcl that points at a module, declares inputs, and lets Terragrunt generate the boilerplate at apply time. The result: backend keys, provider versions, and account wiring live in exactly one place.
Terragrunt works identically against OpenTofu. Set
TERRAGRUNT_TFPATH=tofu(orterraform_binary = "tofu"in your config) and every command below is unchanged.
1. Repository layout: live vs. modules
Separate the definition of infrastructure (reusable modules) from the instantiation of it (the live tree). Modules can live in this repo or a versioned registry; the live tree is environment-specific and changes constantly.
infra/
modules/ # reusable, versioned Terraform modules
network/
eks/
rds/
live/
root.hcl # account-agnostic shared config
dev/
account.hcl # account_id, account_name
us-east-1/
region.hcl # aws_region
network/
terragrunt.hcl
eks/
terragrunt.hcl
rds/
terragrunt.hcl
prod/
account.hcl
us-east-1/
region.hcl
network/
terragrunt.hcl
eks/
terragrunt.hcl
rds/
terragrunt.hcl
The hierarchy environment / region / component is the load-bearing convention. The directory path is the identity of a unit of infrastructure, and we will derive the state key directly from it so two components can never collide.
2. Generating backend config with remote_state
Put the backend definition in live/root.hcl once. The key is derived from the relative path between the root config and the leaf, so each component lands at a unique, predictable path in state.
# live/root.hcl
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "acme-tfstate-${local.account_vars.locals.account_id}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = local.region_vars.locals.aws_region
encrypt = true
use_lockfile = true
}
}
locals {
account_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
region_vars = read_terragrunt_config(find_in_parent_folders("region.hcl"))
}
A few details that matter for correctness:
path_relative_to_include()returns the path from the included (parent) config to the current unit, e.g.dev/us-east-1/eks. That becomes the state key, so isolation is automatic.if_exists = "overwrite_terragrunt"only overwrites files Terragrunt itself generated, leaving any hand-writtenbackend.tfuntouched.use_lockfile = trueuses S3-native conditional writes for state locking. This is the current approach and removes the need for a DynamoDB lock table; on older Terraform/provider versions you would instead setdynamodb_tablehere.
For an Azure backend the shape is the same, only the config differs:
remote_state {
backend = "azurerm"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
resource_group_name = "rg-tfstate"
storage_account_name = "acmetfstate${local.account_vars.locals.account_id}"
container_name = "tfstate"
key = "${path_relative_to_include()}/terraform.tfstate"
}
}
3. Generating provider config with generate blocks
remote_state only handles the backend. For the provider, use a generate block so every component gets a consistently configured, account-correct provider without repeating it.
# live/root.hcl (continued)
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
provider "aws" {
region = "${local.region_vars.locals.aws_region}"
assume_role {
role_arn = "arn:aws:iam::${local.account_vars.locals.account_id}:role/terraform-exec"
}
default_tags {
tags = {
Environment = "${local.account_vars.locals.account_name}"
ManagedBy = "terragrunt"
}
}
}
EOF
}
This is where multi-account isolation becomes real: each environment’s account.hcl carries a different account_id, so the generated provider assumes a role in the right account. There is no shared credential blob and no chance of pointing a dev apply at prod — the role ARN is computed from the directory you are standing in.
# live/prod/account.hcl
locals {
account_name = "prod"
account_id = "222222222222"
}
# live/dev/account.hcl
locals {
account_name = "dev"
account_id = "111111111111"
}
4. Sharing inputs with include and read_terragrunt_config
Each leaf terragrunt.hcl pulls in the root via an include block. include is what activates the generated backend and provider; without it the leaf would have neither.
# live/dev/us-east-1/network/terragrunt.hcl
include "root" {
path = find_in_parent_folders("root.hcl")
}
terraform {
source = "${dirname(find_in_parent_folders("root.hcl"))}/../modules/network"
# In production, prefer a pinned, versioned source:
# source = "git::git@github.com:acme/infra-modules.git//network?ref=v1.4.0"
}
locals {
env_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
}
inputs = {
vpc_cidr = "10.10.0.0/16"
environment = local.env_vars.locals.account_name
enable_nat_gateway = true
single_nat_gateway = true # one NAT in dev to save money
}
read_terragrunt_config parses another HCL file and exposes its locals, so common facts (account name, region, org-wide CIDR plan) are defined once and read everywhere. Inputs declared in the leaf are merged with anything from the included config, with the leaf winning — that is exactly the override behavior you want for per-environment tuning.
Pin your module
sourceto a tag or commit in anything beyond a sandbox. An unpinnedsourcemeans a plan today and an apply tomorrow can run different module code. Versioned sources are also what make promotion (Step 7) a deliberate, reviewable act.
5. Wiring dependencies between modules
EKS needs the VPC’s subnet IDs. RDS needs them too. Encode that with a dependency block, which reads another unit’s outputs and exposes them as dependency.<name>.outputs.<key>.
# live/dev/us-east-1/eks/terragrunt.hcl
include "root" {
path = find_in_parent_folders("root.hcl")
}
terraform {
source = "${dirname(find_in_parent_folders("root.hcl"))}/../modules/eks"
}
dependency "network" {
config_path = "../network"
mock_outputs = {
vpc_id = "vpc-00000000000000000"
private_subnet_ids = ["subnet-1111", "subnet-2222", "subnet-3333"]
}
mock_outputs_allowed_terraform_commands = ["validate", "plan", "init"]
}
inputs = {
cluster_name = "dev-platform"
kubernetes_version = "1.31"
vpc_id = dependency.network.outputs.vpc_id
subnet_ids = dependency.network.outputs.private_subnet_ids
}
The mock_outputs block is the part people get wrong. When you plan the EKS unit and the network has not been applied yet, its real outputs do not exist — the plan would fail trying to read them. Mock values let plan, validate, and init proceed with placeholders. The mock_outputs_allowed_terraform_commands allowlist is critical: it ensures apply and destroy are never fed fake subnet IDs. An apply will only run once the real outputs are available.
Because RDS depends on ../network the same way, Terragrunt now knows the order: network first, then EKS and RDS. You did not write that order anywhere; it is inferred from the dependency graph.
6. run-all for whole-environment plans and applies
run-all walks every terragrunt.hcl under the current directory, builds the dependency DAG, and runs your command in topological order — applying dependencies before dependents and parallelizing independent units.
# Stand up the entire dev/us-east-1 stack in dependency order
cd infra/live/dev/us-east-1
terragrunt run-all plan
terragrunt run-all apply
Two operational notes:
- Always read a
run-all planbefore arun-all apply. Because of mocked outputs, arun-all planon a green field shows approximate plans for downstream modules. Treat it as a sanity check on intent, not a byte-exact preview. - For non-interactive CI, add
--terragrunt-non-interactive. To target a subset, use--terragrunt-include-dir/--terragrunt-exclude-dir. To keep going past a failure during teardown experiments,run-allreports per-unit results so you can see exactly which unit broke.
A minimal CI stage (GitHub Actions) for a single environment:
name: terragrunt-plan
on: [pull_request]
jobs:
plan:
runs-on: ubuntu-latest
permissions:
id-token: write # OIDC, no long-lived keys
contents: read
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::111111111111:role/ci-terraform
aws-region: us-east-1
- uses: gruntwork-io/terragrunt-action@v2
with:
tg_command: "run-all plan"
tg_dir: "infra/live/dev/us-east-1"
7. A promotion workflow: dev to staging to prod
Promotion is the payoff. The module code is identical across environments; only the inputs and account wiring differ, and those live in small, reviewable files. A change flows like this:
- Edit the module in
modules/and tag a release, e.g.v1.5.0. - Bump dev by changing the
ref=inlive/dev/.../terragrunt.hcl, open a PR, review therun-all plan, merge, apply. - Bake, then bump staging to
v1.5.0. Same diff, different directory, separate PR. - Promote to prod by changing the same
ref=underlive/prod/.... The prod PR diff is one line, which is precisely what you want a reviewer to see.
Environment-specific behavior stays in inputs, not code. The prod overrides are explicit and isolated:
# live/prod/us-east-1/network/terragrunt.hcl (inputs only)
inputs = {
vpc_cidr = "10.30.0.0/16"
single_nat_gateway = false # one NAT per AZ in prod for resilience
}
# live/prod/us-east-1/eks/terragrunt.hcl (inputs only)
inputs = {
cluster_name = "prod-platform"
node_min = 3
node_max = 20
}
Because the backend key is derived from the path and the provider role is derived from account.hcl, the prod state is in the prod account’s bucket and the apply assumes the prod role — guaranteed by structure, not by a runbook step someone might skip.
Keep the module version pinned per environment rather than floating all environments off
main. The whole point of promotion is that prod runs code that already survived dev and staging. If every environment tracksmain, you have re-invented “deploy straight to prod.”
Verify
Confirm the wiring does what you think before trusting it.
# 1. Inspect the generated files for one unit (do not commit these)
cd infra/live/dev/us-east-1/eks
terragrunt init
cat backend.tf provider.tf # exception to the usual no-cat rule: confirm generation
# 2. Confirm the dependency graph and apply order
cd infra/live/dev/us-east-1
terragrunt graph-dependencies # emits Graphviz DOT; pipe to `dot -Tpng` if desired
# 3. Validate every unit without touching cloud state
terragrunt run-all validate
# 4. Confirm state isolation: keys must differ per component
aws s3 ls s3://acme-tfstate-111111111111/dev/us-east-1/ --recursive
You are looking for three things: each unit generated its own backend.tf/provider.tf, the dependency graph shows network upstream of eks and rds, and the S3 listing shows distinct keys like dev/us-east-1/network/terraform.tfstate and dev/us-east-1/eks/terraform.tfstate.
Checklist
When Terragrunt is the wrong tool
Terragrunt earns its keep when you have many near-identical stacks across accounts and regions. It is overhead you should decline when:
- You have one or two environments. Two copied root modules are easier to read than a generated-config indirection layer a new hire has to learn.
- Your platform has a native equivalent. Terraform stacks (HCP/Terraform) and workspace-per-environment cover a real slice of this. If you are all-in on Terraform Cloud, evaluate it before adding Terragrunt.
- You can express it as one configurable module. If environments differ only by variable values and you are comfortable with workspaces, a single root with
*.tfvarsper environment may be enough.
The exit ramp matters too. Because Terragrunt only generates standard Terraform files and calls the normal binary, leaving is tractable: commit the generated backend.tf/provider.tf, inline the inputs as .tfvars, and you are back to vanilla Terraform with state intact. Adopt it for the duplication it removes, not because it is fashionable — and keep the generated output boring enough that walking away is always an option.
Next steps
Add a _envcommon/ layer for component config shared across all environments (the EKS inputs that never change between dev and prod), include it alongside root.hcl, and let each environment override only what truly differs. That collapses the last of the duplication and makes the promotion diff smaller still.