Orchestrating Multi-Environment Infrastructure with Terraform Stacks

If you have run Terraform at scale, you know the shape of the pain. You write a module once, then you copy a thin root module per environment, wire up a workspace for each, and glue the whole thing together with a CI pipeline that knows the dependency order in its head. Adding a region means another workspace. Promoting a change means babysitting terraform apply across dev, staging, and prod in the right sequence, hoping nobody skips a step.

Terraform Stacks is HashiCorp’s answer to that sprawl. A Stack lets you declare your infrastructure as a set of components once, then declare the deployments (the environments) that instantiate those components, and HCP Terraform orchestrates plans and applies across all of them with a dependency graph it computes for you. This guide is a practitioner’s walk-through: the file structure, the wiring, the orchestration rules, and the migration path off workspaces and Terragrunt.

Stacks runs on HCP Terraform (and Terraform Enterprise builds that support it). The authoring language and tfstack.hcl / tfdeploy.hcl files are stable enough to build against, but treat specific knobs as version-sensitive and check terraform stacks CLI help on your installed version.

1. Stacks vs the workspace-per-environment pattern: what changes

The mental model is the biggest shift, so anchor it before touching syntax.

In the classic pattern, the unit of work is a workspace: one state file, one set of variables, one terraform apply. An “environment” is a workspace (or a folder of workspaces), and you replicate configuration to replicate environments. Orchestration across them lives outside Terraform.

With Stacks, there are two new units:

Concept	Classic workspaces	Terraform Stacks
Reusable infra definition	Root module per env	`component` block (authored once)
Environment instance	A workspace	A `deployment` block
State boundary	One state per workspace	One state per component, per deployment
Cross-env orchestration	External CI / scripts	Built-in, graph-driven
Provider config	Per root module	Declared once, passed to components
Change promotion	Manual run ordering	`orchestrate` rules + auto-approve

The practical consequence: you define network, data, and app once as components, then say “I want these in dev, staging, and prod, with these inputs each.” Stacks expands that into per-deployment, per-component states and plans them together. You stop maintaining N copies of the same root module.

A Stack is a directory containing two kinds of HCL:

Stack configuration (*.tfstack.hcl or tfstack.hcl) declares components, providers, and the variables/outputs of the Stack itself.
Deployment configuration (*.tfdeploy.hcl or tfdeploy.hcl) declares deployments, supplies their inputs, and defines orchestration rules.

stacks/platform/
  components.tfstack.hcl     # what the Stack is made of
  providers.tfstack.hcl      # provider wiring (can be one file)
  variables.tfstack.hcl      # Stack-level inputs
  deployments.tfdeploy.hcl   # which environments, and their values
  modules/
    network/
    app/

2. Authoring tfstack.hcl components and wiring providers

A component is a Terraform module plus the providers it should run with and the inputs it needs. Crucially, providers are configured at the Stack level and passed into components, rather than configured inside each module. This is what lets one component definition serve many deployments with different credentials or regions.

Start with provider requirements and the Stack’s own variables.

# variables.tfstack.hcl
variable "region" {
  type = string
}

variable "aws_role_arn" {
  type = string
}

variable "instance_count" {
  type    = number
  default = 2
}

# identity_token issues an OIDC token the AWS provider exchanges for creds.
# No long-lived secrets in the Stack.
identity_token "aws" {
  audience = ["aws.workload.identity"]
}

Now declare and configure providers. required_providers lives in the Stack, and each provider block names an instance you can hand to components. Note the use of for_each to fan a provider across deployments is handled by deployment inputs, not here; here you wire one logical provider.

# providers.tfstack.hcl
required_providers {
  aws = {
    source  = "hashicorp/aws"
    version = "~> 5.60"
  }
  random = {
    source  = "hashicorp/random"
    version = "~> 3.6"
  }
}

provider "aws" "this" {
  config {
    region = var.region

    assume_role_with_web_identity {
      role_arn           = var.aws_role_arn
      web_identity_token = identity_token.aws.jwt
    }
  }
}

provider "random" "this" {}

Then the components themselves. Each component points at a source module, passes typed inputs, and is granted a set of providers.

# components.tfstack.hcl
component "network" {
  source = "./modules/network"

  inputs = {
    region = var.region
  }

  providers = {
    aws = provider.aws.this
  }
}

component "app" {
  source = "./modules/app"

  inputs = {
    subnet_ids     = component.network.subnet_ids
    instance_count = var.instance_count
  }

  providers = {
    aws    = provider.aws.this
    random = provider.random.this
  }
}

The referenced modules are ordinary Terraform modules with one rule: their provider requirements must be satisfied by what the component passes in. Inside ./modules/network you write variable, resource, and output blocks exactly as you would in any module. There is no terraform { backend ... } block; Stacks manages state.

3. Declaring deployments and varsets in tfdeploy.hcl

The Stack configuration is environment-agnostic. The deployment configuration is where environments come to life. Each deployment block produces a full, independent instantiation of every component, with its own inputs and its own state.

# deployments.tfdeploy.hcl
deployment "dev" {
  inputs = {
    region         = "us-east-1"
    aws_role_arn   = "arn:aws:iam::111111111111:role/stacks-dev"
    instance_count = 1
  }
}

deployment "staging" {
  inputs = {
    region         = "us-east-1"
    aws_role_arn   = "arn:aws:iam::222222222222:role/stacks-staging"
    instance_count = 2
  }
}

deployment "prod" {
  inputs = {
    region         = "us-west-2"
    aws_role_arn   = "arn:aws:iam::333333333333:role/stacks-prod"
    instance_count = 6
  }
}

Every key in a deployment’s inputs maps to a Stack-level variable. Adding a fourth environment is now a copy-paste of one block, not a new workspace, new backend, and new pipeline wiring.

To avoid repeating shared values, factor them into a locals block and reference it. A common pattern is a baseline map merged per environment:

# deployments.tfdeploy.hcl
locals {
  common = {
    instance_count = 2
  }
}

deployment "staging" {
  inputs = merge(local.common, {
    region       = "us-east-1"
    aws_role_arn = "arn:aws:iam::222222222222:role/stacks-staging"
  })
}

For secrets and reusable variable bundles, bind a variable set (varset) from HCP Terraform with a store. This keeps credentials and shared config out of the repo and lets platform teams manage them centrally.

# deployments.tfdeploy.hcl
store "varset" "shared" {
  id       = "varset-AbC123XyZ"
  category = "terraform"
}

deployment "prod" {
  inputs = {
    region         = "us-west-2"
    aws_role_arn   = store.varset.shared.aws_role_arn
    instance_count = 6
  }
}

4. Passing outputs between components and cross-component dependencies

You already saw the key move in section 2: component.network.subnet_ids is referenced in the app component’s inputs. This single reference does two things. It passes the value from one component’s outputs to another’s inputs, and it declares the dependency — Stacks knows app must plan and apply after network, and it builds the DAG accordingly. You never hand-order them.

For an output of the network module to be referenceable, the module must expose it:

# modules/network/outputs.tf
output "subnet_ids" {
  value = aws_subnet.private[*].id
}

To surface values out of the Stack as a whole (for consumers, dashboards, or downstream Stacks), declare output blocks in the Stack configuration. Mark sensitive values so they are not printed in plans.

# outputs.tfstack.hcl
output "app_endpoint" {
  type        = string
  value       = component.app.endpoint
  description = "Public endpoint for the app tier"
}

output "db_password" {
  type      = string
  value     = component.app.db_password
  sensitive = true
}

The dependency graph is per deployment. dev’s app depends on dev’s network; it has no relationship to prod’s network. That isolation is automatic and is what makes blast radius predictable.

5. Deferred changes and planning against not-yet-created infrastructure

Here is the capability that is hard to replicate with workspaces. In a fresh deployment, a downstream component frequently needs values from an upstream component that does not exist yet. A classic example: the app component wants for_each over subnets, but on the very first apply the subnets are unknown.

In ordinary Terraform, for_each over an unknown value is a hard error — you are forced into multi-step applies and -target gymnastics. Stacks introduces deferred changes. When a plan depends on values that are not yet known, Stacks marks those changes as deferred instead of failing. It applies what it can now, learns the real values, and completes the deferred work on a subsequent plan/apply — all tracked as part of the same change, no manual targeting.

# modules/app/main.tf
# subnet_ids may be unknown on first apply; Stacks defers the
# dependent resources rather than erroring on unknown for_each keys.
resource "aws_instance" "web" {
  for_each = toset(var.subnet_ids)

  ami           = var.ami_id
  instance_type = "t3.small"
  subnet_id     = each.value
}

In the plan output you will see resources reported as deferred, with a clear note that they cannot be planned until upstream values resolve. The first apply provisions the network and any instances it can; once subnet IDs are concrete, the next run plans and applies the remaining instances. The operational win is that bootstrapping an entirely new environment becomes a normal apply, not a runbook.

6. Orchestration rules, auto-approve conditions, and rollout ordering

Stacks plans every affected deployment, but you decide which plans apply automatically and which wait for a human. That policy lives in orchestrate blocks in the deployment configuration. The most common rule type is auto_approve, which evaluates conditions against a plan and approves it when they hold.

# deployments.tfdeploy.hcl
# Auto-approve plans that contain no resource removals.
orchestrate "auto_approve" "no_deletes" {
  check {
    condition = context.plan.changes.remove == 0
    reason    = "Plan removes ${context.plan.changes.remove} resources; require manual review."
  }
}

The context object exposes facts about the deployment and its plan — change counts, the deployment name, and metadata you can branch on. Every check must pass for the plan to auto-approve; any failing check sends the plan to manual approval with the reason attached.

You can gate environments differently and encode promotion order. A frequent pattern: let non-prod apply automatically when safe, but make prod depend on staging having converged, and never auto-approve destructive prod changes.

orchestrate "auto_approve" "safe_nonprod" {
  check {
    # Only auto-approve dev and staging.
    condition = contains(["dev", "staging"], context.deployment.name)
    reason    = "Manual approval required for ${context.deployment.name}."
  }

  check {
    condition = context.plan.changes.remove == 0
    reason    = "Refusing to auto-approve deletes in ${context.deployment.name}."
  }
}

Because the dependency comes from real output references, rollout ordering between components is inherent — network before app, always. Ordering between deployments (promote dev, then staging, then prod) is something you express by gating prod behind manual approval, or by referencing upstream-deployment state through your own conventions, then driving the wave through the run queue. The key idea: orchestration policy is code in the Stack, reviewed like everything else, not tribal knowledge in a pipeline.

7. Operational concerns: state, drift, and observability per deployment

State. You do not manage backends. Stacks stores state per component, per deployment, inside HCP Terraform. There is no terraform.tfstate to lose, no S3 bucket plus DynamoDB lock table to provision before you can begin. The flip side is that terraform state surgery does not apply the same way; you work through Stack runs and the platform’s state handling rather than poking files.

Drift. Each deployment is reconciled against its own state, so drift is reported and corrected at the deployment-component granularity. A drifted security group in staging does not entangle prod. Because every deployment is a first-class object, you get a clear per-environment view of what changed and what is pending.

Observability. Treat each deployment as the unit you watch. Plans, applies, deferred changes, and approval status are all per deployment in the HCP Terraform UI and API. Stack outputs (section 4) are the contract you export to humans and downstream systems; keep sensitive ones flagged. When something looks off, the question is always “which deployment, which component,” and the model answers it directly.

Verify

Author locally, validate, then push to HCP Terraform to plan against real deployments.

Initialize and validate the Stack with the Stacks CLI. init resolves providers and modules; validate type-checks components, providers, and deployment inputs.

terraform stacks init
terraform stacks validate

Confirm provider wiring — validate fails if a component requests a provider the Stack does not pass in, or if required_providers is missing an entry. A clean validate means every providers = { ... } mapping is satisfied.
Inspect a plan and look for deferred changes. Trigger a plan (via VCS-connected Stack or CLI) and read the summary. On a brand-new deployment you should see resources marked deferred where they depend on not-yet-known upstream outputs — that is correct behavior, not an error.
Check orchestration decisions. In the run for each deployment, confirm dev/staging auto-approve under your orchestrate rules while any plan containing removals routes to manual approval with your reason string shown.
Verify per-deployment isolation. Make a no-op change scoped to one deployment’s inputs and confirm only that deployment re-plans; the others report no changes.
Read back Stack outputs and confirm sensitive values are redacted in plan output and surfaced only through the proper API/UI channels.

# List Stacks CLI subcommands available on your installed version.
terraform stacks --help

Checklist

Migration path from existing workspaces and Terragrunt

You do not rewrite everything at once. The realistic sequence:

Identify the component boundaries. Your existing root modules or Terragrunt units usually are your components — network, data, app. Promote each to a module under the Stack with clean variable/output interfaces. If a module currently reaches into another’s remote state with terraform_remote_state, that becomes a direct component.x.output reference, which is strictly better.
Replace per-environment roots with deployments. Every Terragrunt terragrunt.hcl that sets environment inputs, and every per-env workspace, collapses into one deployment block. The DRY that Terragrunt gives you through include and dependency is native in Stacks: shared structure lives in components, per-env values in deployments, and cross-unit dependencies are output references.
Move provider and backend config up. Delete backend blocks (Stacks owns state) and per-root provider configuration; declare providers once at the Stack level and pass them in. Terragrunt’s generated provider/backend files are no longer needed.
Import live infrastructure. For resources you must adopt rather than recreate, bring them under the Stack’s management deliberately, deployment by deployment, and verify a no-op plan before trusting it. Do not delete the old workspace until the corresponding deployment shows no drift.
Cut over one environment at a time. Bootstrap dev as a Stack, validate the orchestration and deferred-change behavior, then promote the pattern to staging and prod. Keep the legacy pipeline read-only during the overlap so nothing double-applies.

The destination is a single declaration of what your platform is, plus a short list of where it runs, with HCP Terraform computing the graph and driving the rollout. That is the configuration sprawl of workspace-per-environment and the orchestration logic of Terragrunt, both folded into the tool — which is exactly where they belong.

Orchestrating Multi-Environment Infrastructure with Terraform Stacks

1. Stacks vs the workspace-per-environment pattern: what changes

2. Authoring tfstack.hcl components and wiring providers

3. Declaring deployments and varsets in tfdeploy.hcl

4. Passing outputs between components and cross-component dependencies

5. Deferred changes and planning against not-yet-created infrastructure

6. Orchestration rules, auto-approve conditions, and rollout ordering

7. Operational concerns: state, drift, and observability per deployment

Verify

Checklist

Migration path from existing workspaces and Terragrunt

Written by Vinod

Comments

Keep Reading

Dynamic Inventory and Secure Secrets for Ansible at Cloud Scale

Engineering Idempotent Ansible Collections with Molecule Testing

Programmatic Infrastructure with CDK for Terraform in TypeScript