Terraform Lesson 11 of 57

Terraform State, In Depth: the State File, the state Commands, Locking & Sensitive Data

Every other Terraform concept — providers, the plan, modules, the dependency graph — ultimately serves one file. State is how Terraform remembers what it built. Take that file away and Terraform forgets it owns anything: the next plan proposes to create a duplicate of every resource that already exists, because as far as Terraform is concerned the world is empty. State is therefore simultaneously the most important and the most dangerous artefact in the entire toolchain — the thing that makes incremental, idempotent infrastructure possible, and the thing that turns a careless command into a 3 a.m. incident.

This lesson is the mechanics lesson for state. Two sister lessons cover the other halves of the story: Remote State at Scale is about operating state for big teams (splitting it, sharing it across stacks, the org-wide guardrails), and State Surgery is the incident-response playbook for when state is corrupt, locked by a dead process, or split-brained. This one stays deliberately at the foundation: what state actually is, what is inside the file byte for byte, the complete terraform state subcommand surface with every flag, how the addressing you feed those commands works, how locking protects you and how force-unlock betrays you, and the single most-asked interview fact about state — that it stores your secrets in plaintext. By the end you should be able to read a state file, reach for the right state subcommand without guessing, explain locking to a sceptical reviewer, and harden a backend so its secrets are not a liability. Everything here applies identically to OpenTofu (the open-source fork), with the one big exception I will flag loudly: OpenTofu can encrypt state client-side, and Terraform cannot.

Learning objectives

After working through this lesson you will be able to:

Prerequisites

You should already be comfortable with the core workflow — init, plan, apply, destroy — and able to read basic HCL (resources, variables, outputs). If those are shaky, start with Terraform Fundamentals, which introduces state at a high level; this lesson is the deep dive that grounds it in the actual file and commands. A free cloud account is not required: the entire hands-on lab uses the local, random, and null providers, so you can run every command on your laptop with nothing to clean up and zero spend. This sits in the State module of the Terraform Zero-to-Hero ladder, immediately after provisioners and immediately before the Terragrunt deep dive; it is the foundation the two scale/recovery state lessons build on, so do this one first.

What state is, and the three jobs it does

When you run terraform apply, Terraform creates real cloud resources — a VPC, a database, a DNS record — each of which the cloud gives back a unique identifier (vpc-0a1b2c3d, /subscriptions/.../virtualNetworks/vnet-hub, and so on). Your .tf files never mention those IDs; they only describe what you want. Something has to remember the bridge between “the resource I called aws_vpc.main in my code” and “the real object vpc-0a1b2c3d in the cloud”. That bridge is state.

State does three distinct jobs, and naming them tells you exactly why it cannot be thrown away:

Job What it stores What breaks without it
Mapping (the core job) The binding from each configuration address (aws_vpc.main) to its real-world resource ID and provider Terraform cannot tell “update the thing I already made” from “make a brand-new thing”, so it re-creates everything — duplicate resources, or destroy-and-recreate churn.
Metadata Resource dependencies, provider configuration references, the schema version of each resource, and (for deletions) the order to tear things down Terraform loses the dependency ordering it needs to destroy resources safely (e.g. delete the subnet before the VPC).
Performance / caching The last-known attribute values of every managed resource Without a cache Terraform would have to query the provider API for every resource on every plan; for large estates that is slow and rate-limit-prone. State lets plan diff against the cache and refresh selectively.

A useful one-sentence definition to keep: state is Terraform’s source of truth about what it manages, mapping your declared resources to real infrastructure plus the metadata needed to plan changes safely. It is not a backup of your infrastructure and it is not your configuration — it is the memory that connects the two.

Desired state vs current state vs real-world state

Three “states” float around in conversation; an interviewer will check you can separate them.

A terraform plan compares desired against stored (optionally refreshing stored from real-world first). When stored and real-world disagree because someone changed something in the console, that gap is drift — covered in the refresh section below and, in depth, in the surgery lesson.

Anatomy of the state file: never hand-edit this

State is a single JSON document. By default Terraform writes it to terraform.tfstate in your working directory (the local backend); with a remote backend the same JSON lives in S3, Azure Blob Storage, GCS, or Terraform Cloud instead, but the shape is identical. Pull a copy and look at the top:

terraform state pull > current.tfstate
jq '{version, terraform_version, serial, lineage}' current.tfstate
{
  "version": 4,
  "terraform_version": "1.9.5",
  "serial": 23,
  "lineage": "8f2a1c9e-4b3d-4a77-9f12-2d6e5a0b1c34"
}

Every field earns its place. Memorise this table — the top four fields show up in interviews constantly, and confusing version with terraform_version is a classic stumble:

Field What it is Why it matters
version The state file format version (currently 4, stable since Terraform 0.13) This is the schema of the JSON itself, not your Terraform binary version. Editing it by hand or mismatching it corrupts the file.
terraform_version The Terraform/OpenTofu binary version that last wrote the state Terraform refuses to operate on state written by a newer binary than yours, to avoid format surprises. This is why a teammate’s apply on a newer CLI can lock others out.
serial A monotonically increasing integer, bumped on every write The backend uses it for optimistic locking: a write whose serial is not higher than the stored one is a stale/lost write. A lower serial overwriting a higher one means you have lost changes.
lineage A UUID generated once, when the state is first created Identifies a single state’s history. Two state files with different lineage are not versions of one another — pushing across a lineage boundary is the canonical split-brain trigger.
outputs The root-module output values, with their value, type, and a sensitive flag This is how terraform output and terraform_remote_state read values — and why outputs (even “sensitive” ones) sit in state in plaintext.
resources The array of every managed resource and data source, each with its module, mode (managed/data), type, name, provider, and an instances array The heart of the file: the actual mapping. Each instance carries attributes (the cached values), schema_version, and for count/for_each, an index_key.
check_results Results of check blocks and pre/postconditions from the last apply Lets Terraform report assertion outcomes; you never edit it.

A single resource instance inside resources[].instances[] looks like this (trimmed):

{
  "mode": "managed",
  "type": "random_pet",
  "name": "server",
  "provider": "provider[\"registry.terraform.io/hashicorp/random\"]",
  "instances": [
    {
      "schema_version": 0,
      "attributes": {
        "id": "harmless-cougar",
        "length": 2,
        "separator": "-"
      },
      "dependencies": ["random_integer.seed"]
    }
  ]
}

Notice three things that drive every later decision. First, attributes is the performance cache — those are the values plan diffs against. Second, dependencies is the metadata that orders teardown. Third — and this is the security headline of the whole lesson — attributes holds whatever the resource exposes, verbatim and unencrypted. If this were a database resource, its admin password would be sitting right there in plaintext.

The one rule of the state file: never hand-edit it. It is tempting to open terraform.tfstate and tweak a value, but the JSON has invariants — the serial, the lineage, schema versions, dependency arrays, and a checksum on remote backends — that are easy to break and impossible to eyeball. Every legitimate change to state has a dedicated, safe command (the terraform state subcommands below) that maintains those invariants for you. If you genuinely must edit raw JSON in an incident, do it on a pulled copy with jq, bump the serial, and push it back — the technique is in the surgery lesson, and even there it is the tool of last resort.

The terraform state command family, exhaustively

terraform state is the safe, supported interface for inspecting and surgically modifying state without hand-editing JSON. Every subcommand operates on resource addresses (covered next) and most of the mutating ones automatically take a local backup to a timestamped *.backup file. Here is the complete surface, with the flags that matter:

Subcommand What it does Mutates state? Key flags & notes
terraform state list Lists the addresses of every resource (and data source) tracked in state No (read-only) Accepts an optional address/pattern to filter (state list 'module.network.*'); -id=<id> filters by real-world ID; -state=<path> reads a specific local file. The first command to run when orienting yourself.
terraform state show <address> Prints the attributes of one resource as stored in state, in HCL-like form No (read-only) Shows the cached values including IDs; great for finding a resource’s real ID before an import or rm. Sensitive values are shown (it reads raw state).
terraform state mv <src> <dst> Renames/moves a resource’s address in state without touching the real resource Yes Use after a refactor (rename, wrap in a module, countfor_each re-key). -state-out=<path> moves into a different state file; -dry-run previews; -lock-timeout. Prefer declarative moved {} blocks in config for routine refactors — see below.
terraform state rm <address> Forgets a resource — removes it from state while leaving the real infrastructure running Yes Use to hand a resource to another state, or to drop a phantom entry whose backing object was deleted out of band. Does not destroy anything. Prefer the declarative removed {} block for reviewable, in-config removals.
terraform state pull Downloads and prints the raw state JSON to stdout (works for local and remote backends) No The canonical way to back up (terraform state pull > backup.tfstate) and to inspect remote state with jq. Always do this before any mutating operation.
terraform state push <file> Uploads a local state file to the configured backend Yes (overwrites) The most dangerous command in Terraform. Enforces the serial and lineage guards by default; -force bypasses them and is a footgun — never use it unless you have personally reconciled both files.
terraform state replace-provider <from> <to> Rewrites the provider source address for every resource in state in one shot Yes For registry namespace moves — the classic registry.terraform.io/-/awsregistry.terraform.io/hashicorp/aws, or the Terraform↔OpenTofu split. -auto-approve skips the prompt; -lock-timeout.

A few important relatives that are not under terraform state but operate on the same data — know where the boundary is, because interviewers blur it:

Command Relationship to state Note
terraform import <address> <id> Adds an existing real resource into state The imperative form. Prefer the declarative import {} block (Terraform 1.5+ / OpenTofu) — it is plan-reviewable and can generate config with -generate-config-out. Briefly here; depth in the surgery lesson.
terraform state vs terraform taint/untaint taint marked a resource for replacement in state Deprecated. Use terraform apply -replace=<address> instead — it is plan-visible and does not pre-mutate state.
terraform force-unlock <LOCK_ID> Removes a lock, not a resource A top-level command, not a state subcommand. Covered in the locking section.
terraform refresh Updates attributes in state from the real world Deprecated standalone; use terraform apply -refresh-only. See the refresh section.
terraform show Renders the whole state (or a saved plan) as text/JSON terraform show -json emits machine-readable state; read-only.
terraform output Reads outputs out of state -json/-raw; honours the sensitive flag (redacts in CLI but the value is still in state).

state mv/state rm vs moved {}/removed {} blocks. Modern Terraform gives you declarative equivalents for the two most common surgeries. A moved { from = ... to = ... } block in your config renames an address automatically on the next apply, and a removed { from = ... lifecycle { destroy = false } } block forgets a resource without destroying it — both live in code, show up in plan, and are reviewable in a pull request. Reach for the CLI state subcommands for one-off corrections, incident response, or cross-file moves; reach for the blocks for refactors that should be permanent and peer-reviewed. The blocks are the modern default for routine work.

Worked examples of each mutating command

These are the patterns you will actually type. Back up first, every timeterraform state pull > backup.tfstate.

# LIST — orient yourself
terraform state list
# random_pet.server
# local_file.greeting["app"]
# module.network.aws_subnet.private[0]

# SHOW — inspect one resource's stored attributes (and real id)
terraform state show 'local_file.greeting["app"]'

# MV — rename after refactoring a resource into a module
terraform state mv aws_s3_bucket.logs module.logging.aws_s3_bucket.this

# MV — re-key when converting count -> for_each (note the quotes!)
terraform state mv 'aws_instance.web[0]' 'aws_instance.web["az-a"]'

# RM — forget a resource without destroying the real thing
terraform state rm aws_db_instance.legacy_replica

# REPLACE-PROVIDER — after a registry namespace change
terraform state replace-provider \
  registry.terraform.io/-/aws \
  registry.terraform.io/hashicorp/aws

# PULL / PUSH — backup, then (rarely) restore an edited copy
terraform state pull > backup.tfstate
terraform state push edited.tfstate   # serial/lineage guards enforced

Resource addressing: the coordinates for every state op

Every state subcommand takes a resource address, and getting the syntax wrong is the single biggest source of friction. An address is built from up to four parts:

Part Syntax Example
Module path module.<name> (repeat for nesting) module.network.module.subnets
Resource mode + type + name <type>.<name> (managed) or data.<type>.<name> (data source) aws_subnet.private, data.aws_ami.ubuntu
count index [<integer>] aws_subnet.private[0]
for_each key ["<string-key>"] aws_subnet.private["az-a"]

Putting it together, a deeply nested, for_each-keyed address reads:

module.network.module.subnets.aws_subnet.private["az-a"]

Two rules save you from the most common mistakes:

  1. Quote addresses that contain brackets or quotes so your shell does not interpret them. terraform state show 'aws_subnet.private["az-a"]' — single-quote the whole thing. Forgetting this is why state mv aws_instance.web[0] ... mysteriously “does nothing”: the shell ate the brackets.
  2. count keys are integers, for_each keys are strings. aws_subnet.private[0] is a count instance; aws_subnet.private["a"] is a for_each instance. This is exactly why converting count to for_each requires a state mv to re-key each instance — the addresses genuinely change.

You can preview valid addresses any time with terraform state list, and validate a specific one with terraform state show <address>; if it prints attributes, the address is real.

State locking: how concurrency stays safe

State has a fatal failure mode: two apply runs writing the same file at once. Run A reads serial 23, run B reads serial 23, both compute changes against that snapshot, both write — and whichever writes second clobbers the first, silently dropping resources from tracking. The fix is a lock: before any operation that could write state, Terraform asks the backend for an exclusive lock; a second run that wants the same lock either waits or fails fast. This is why a remote backend without locking is, in the words of the scale lesson, “just a more convenient way to corrupt state”.

Which operations lock

Anything that can mutate state acquires a lock: apply, destroy, plan (it may refresh and write), refresh, import, and the mutating state subcommands (mv, rm, push, replace-provider). Read-only commands — state list, state show, output, show — do not need a write lock. The local backend uses OS-level file locking; remote backends use a backend-specific mechanism:

Backend Locking mechanism What the lock is, physically Notes
local OS advisory file lock A lock on terraform.tfstate on disk Single-machine only; useless for teams.
s3 Native S3 lockfile (use_lockfile = true, Terraform 1.10+/OpenTofu) or a DynamoDB table (legacy) A sibling <key>.tflock object, or a DynamoDB item keyed by LockID DynamoDB table needs a primary key named exactly LockID (String). The native lockfile removes that extra resource — prefer it on new setups.
azurerm Native blob lease A lease held on the state blob itself No extra resource to provision — locking is built into the blob. The lease auto-expires, which bounds stuck locks.
gcs Native lockfile A <key>.tflock object alongside the state Locking is automatic; no extra config.
HCP Terraform / TFC (cloud block) Managed Internal to the platform Locking, versioning, and encryption are all handled for you.

The locking flags

Three CLI flags control locking on the commands that take a write lock:

Flag Default Effect When to change it
-lock=true|false true Whether to acquire a lock at all Almost never set false. Only in tightly controlled read-only automation where you are certain no write occurs. Disabling locking on an apply is how teams corrupt state.
-lock-timeout=<duration> 0s (fail immediately) How long to wait for a held lock before giving up Set in CI (e.g. -lock-timeout=300s) so a benign concurrent run waits five minutes instead of failing instantly.
(the force-unlock command) Manually removes a lock Last resort, see below.
# CI-friendly: wait up to 5 minutes for a lock rather than failing fast
terraform apply -lock-timeout=300s

# A normal run holds the lock only for the duration of the operation
terraform plan

force-unlock: the loaded gun

When a CI job is killed mid-apply (OOM, cancelled pipeline, closed laptop), the lock can outlive the process. The next run fails with a message that includes the lock ID, who held it, the operation, and when:

Error: Error acquiring the state lock

Lock Info:
  ID:        f4c2b3a1-6d5e-4f8a-9b2c-1e7d3a0f5c91
  Operation: OperationTypeApply
  Who:       runner@ci-agent-07
  Created:   2026-06-15 09:14:22 +0000 UTC

You clear it with the ID from the error:

terraform force-unlock f4c2b3a1-6d5e-4f8a-9b2c-1e7d3a0f5c91

force-unlock does not check whether the lock-holder is actually gone. It removes the lock entry, full stop. If a teammate’s apply is still running and you break their lock, you now have two concurrent writers — exactly the corruption locking exists to prevent. Confirm the holding process is genuinely dead first (the CI job is terminated, the pipeline shows failed, the laptop is closed), then unlock. This is why a mature team treats force-unlock as a deliberate, logged, ideally two-person action and never wires it into a pipeline retry — an auto-unlock loop will eventually unlock a live apply.

If force-unlock itself cannot reach the backend (a deleted lock table, a corrupt lease), you clear the lock at the source — delete the DynamoDB item, break the Azure blob lease, or remove the .tflock object. Those backend-specific commands live in the surgery lesson; for everyday stuck locks, force-unlock <ID> after confirming the holder is dead is all you need.

Sensitive data in state: it is plaintext

This is the most important security fact in all of Terraform, and the single most common interview question on state. The state file stores resource attributes exactly as the provider returns them, in plaintext. That includes:

Marking a variable or output sensitive = true does not encrypt anything. It only redacts the value from CLI output and plan diffs (it prints (sensitive value) instead of the secret). The value still lands in terraform.tfstate in the clear. A reviewer who has only ever seen the redacted plan output is often shocked to grep the state file and find the password sitting there.

# The secret is hidden in plan/apply output...
terraform plan
#   + password = (sensitive value)

# ...but it is RIGHT THERE in state, in plaintext:
terraform state pull | jq '.resources[] | select(.type=="random_password") | .instances[0].attributes.result'
# "Xy9$kPzq2!mLr8Wd"

Where else secrets leak

State is the big one, but not the only one. Know the full surface:

Leak vector What leaks Mitigation
State file Every sensitive attribute, in plaintext Encrypt the backend at rest; restrict access; never commit .tfstate; OpenTofu client-side encryption.
Saved plan files (-out=plan.tfplan) Plan files contain the values to be written, including sensitive ones, unencrypted Treat *.tfplan as a secret artefact: short retention, restricted access, never commit.
CLI / CI logs sensitive redacts most, but provider error messages or terraform output misuse can spill values Mark variables/outputs sensitive; scrub CI logs; avoid -json output to shared logs.
terraform_remote_state outputs Every output of the producer stack is readable by any consumer that can read the backend Never put secrets in remote-state outputs. Fetch secrets from a secrets manager instead (see the scale lesson).
Version control A committed .tfstate is a permanent credential leak in Git history .tfstate and .tfstate.backup in .gitignore, always.

The mitigations, in order of leverage

  1. Encrypt the backend at rest (highest leverage, mandatory). Every production backend supports server-side encryption: encrypt = true on S3 (ideally with a customer-managed kms_key_id), storage-service encryption on Azure (optionally customer-managed keys), default encryption on GCS. This means state on disk in the cloud is ciphertext.
  2. Restrict access with least-privilege RBAC. The backend is a secrets store; treat it like one. Scope write access to the specific CI workload identity and a tiny break-glass group — on Azure, Storage Blob Data Contributor on the state container, not the subscription; on AWS, an IAM policy on the one bucket/key, not s3:*. Nobody should have standing write access to production state from a laptop.
  3. Never commit state, and isolate the backend on the network. .gitignore the local files; put the remote backend behind a private endpoint / VPC endpoint and deny public network access so state is never reachable from the open internet.
  4. Avoid putting secrets in state in the first place. Where possible, fetch secrets at apply time from a secrets manager (Vault, AWS Secrets Manager, Azure Key Vault) via a data source rather than generating them in Terraform, and pass ephemeral values (Terraform 1.10+) that are not persisted to state for the inputs that support them.
  5. OpenTofu client-side state encryption (the one big fork difference). OpenTofu (1.7+) can encrypt the entire state and plan files client-side, before they ever reach the backend, using AES-GCM with keys from PBKDF2 passphrases, AWS KMS, GCP KMS, or Azure Key Vault. Terraform has no equivalent — its only at-rest protection is the backend’s server-side encryption. If state-secret exposure is a hard compliance requirement, this is a genuine reason to choose OpenTofu.
# OpenTofu only: encrypt state and plan client-side before they hit the backend
terraform {
  encryption {
    key_provider "aws_kms" "this" {
      kms_key_id = "arn:aws:kms:us-east-1:111122223333:key/abcd-..."
      region     = "us-east-1"
      key_spec   = "AES_256"
    }
    method "aes_gcm" "this" {
      keys = key_provider.aws_kms.this
    }
    state { method = method.aes_gcm.this }
    plan  { method = method.aes_gcm.this }
  }
}

refresh and -refresh-only: reconciling state with reality

State is a cache, and caches go stale when someone changes a resource outside Terraform (a console click, another tool, an autoscaler). That gap between stored and real-world state is drift. Terraform reconciles it by refreshing: querying each resource’s provider for its current attributes and updating the cache.

There are three ways this happens, and the trend is away from the blunt one:

Mechanism What it does Status / when
Implicit refresh during plan/apply By default, plan refreshes state in-memory before computing the diff, so the plan reflects real-world drift The normal path. Skip it with -refresh=false to plan against the cache only (faster, but blind to drift).
terraform plan -refresh-only Computes a plan that only reconciles state with reality — it shows you drift without proposing to revert it The safe, modern way to inspect drift. Pair with apply -refresh-only to adopt the real values into state.
terraform refresh Updates state from the real world and writes immediately Deprecated. It is exactly apply -refresh-only -auto-approve with no review step — which is why it was deprecated in favour of the reviewable form. Avoid it.

The discipline that prevents an outage: when you suspect drift, inspect before you clobber.

# SAFE: see what drifted, change nothing
terraform plan -refresh-only

# Decide per attribute, then either ADOPT reality into state...
terraform apply -refresh-only

# ...or REVERT to code with a normal apply (plans the resource back to config)
terraform apply

The trap is panicking and running a bare terraform apply, which reverts a legitimate emergency hotfix the on-call engineer made at 3 a.m. -refresh-only first, decide second. (The full drift-vs-reality decision tree, including ignore_changes, is in the surgery lesson.)

Importing into state, briefly

When a resource already exists in the cloud but not in state — created by ClickOps, another tool, or a previous Terraform run whose state was lost — you bring it under management with import. This adds a mapping to state without creating anything. The modern form is the declarative import {} block:

import {
  to = aws_s3_bucket.logs
  id = "kloudvin-logs-prod"
}

resource "aws_s3_bucket" "logs" {
  # config matching the live bucket
}
terraform plan   # confirm "1 to import, 0 to change" before applying
terraform apply

Prefer the block over the legacy terraform import <address> <id> CLI: the block is plan-reviewable, batchable, and can scaffold config with terraform plan -generate-config-out=generated.tf. Each resource type has its own import-ID format (an aws_route53_record is ZONEID_name_type, not just an ARN), so check the provider docs’ “Import” section. This is the headline; the full import-and-rebuild workflow — including rebuilding an entire lost state from scratch — lives in the State Surgery lesson.

Terraform state mechanics: the state file mapping config to real resources, the terraform state command surface, locking, and sensitive-data flow

The diagram traces the full picture: your configuration and the real cloud on either side, the state file in the middle holding the mapping plus cached attributes (with secrets in plaintext), the lock that serialises writes, and the terraform state subcommands acting on addresses within the file.

Hands-on lab

This lab needs no cloud account and costs nothing — it uses the random, local, and null providers, all of which run entirely on your machine. You will create state, read it with every read-only state command, perform safe surgery with mv and rm, prove that a “sensitive” value is plaintext in state, and trigger and inspect a lock. Allow about 20 minutes.

1. Scaffold the project.

mkdir tf-state-lab && cd tf-state-lab
# main.tf
terraform {
  required_version = ">= 1.9"
  required_providers {
    random = { source = "hashicorp/random", version = "~> 3.6" }
    local  = { source = "hashicorp/local",  version = "~> 2.5" }
  }
}

resource "random_password" "db" {
  length  = 20
  special = true
}

resource "random_pet" "server" {
  length = 2
}

resource "local_file" "greeting" {
  for_each = toset(["app", "web"])
  filename = "${path.module}/hello-${each.key}.txt"
  content  = "Hello from ${each.key}: ${random_pet.server.id}"
}

output "db_password" {
  value     = random_password.db.result
  sensitive = true
}

2. Init and apply.

terraform init
terraform apply -auto-approve

Expected (abridged):

random_password.db: Creation complete after 0s
random_pet.server: Creation complete after 0s
local_file.greeting["app"]: Creation complete after 0s
local_file.greeting["web"]: Creation complete after 0s

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Outputs:
db_password = <sensitive>

3. Read state with the read-only commands.

terraform state list
# local_file.greeting["app"]
# local_file.greeting["web"]
# random_password.db
# random_pet.server

terraform state show 'local_file.greeting["app"]'
# shows filename, content, the file's id (a content hash), etc.

4. Prove the “sensitive” output is plaintext in state. This is the key learning moment:

# Redacted in CLI:
terraform output db_password
# (error) Output "db_password" is marked sensitive...  -> use -raw to see it

# But the raw secret is sitting in state unencrypted:
terraform state pull | jq -r '.resources[] | select(.type=="random_password") | .instances[0].attributes.result'
# e.g. Xy9$kPzq2!mLr8Wd   <- plaintext

5. Back up, then do safe surgery with mv and rm.

# ALWAYS back up before mutating
terraform state pull > backup.tfstate

# Rename random_pet.server -> random_pet.host (state mv; real value untouched)
terraform state mv random_pet.server random_pet.host
# Now update main.tf to rename the resource block to "host" too, then:
terraform plan      # should show NO changes if the rename is consistent

# Forget one file from state WITHOUT deleting the file on disk
terraform state rm 'local_file.greeting["web"]'
ls hello-web.txt    # the file still exists on disk
terraform state list | grep web || echo "no longer in state"

If you skipped editing main.tf after the state mv, plan will instead show a create of random_pet.server and the host address as orphaned — a perfect illustration of why state mv and config must change together (or why a moved {} block, which does both, is safer).

6. Trigger and inspect a lock (local backend). Open a second terminal in the same directory and run a long no-op while holding the lock; in the first terminal a concurrent op will report the lock. With the local backend the window is tiny, so the cleanest demonstration is to read the lock info Terraform prints on contention, and to practise the unlock command syntax:

# If you ever see "Error acquiring the state lock", note the ID and (after
# confirming no live writer) clear it:
# terraform force-unlock <LOCK_ID>

7. Validation. Confirm state is healthy and matches config:

terraform plan -detailed-exitcode
#   exit 0 = no changes (clean)   exit 2 = changes pending   exit 1 = error

A -detailed-exitcode of 0 proves stored state, config, and the real world all agree.

Cleanup.

terraform destroy -auto-approve
rm -f backup.tfstate hello-*.txt
# optionally: rm -rf .terraform .terraform.lock.hcl terraform.tfstate*

Cost note. Zero. The random, local, and null providers create nothing in any cloud — there is no spend and nothing to leak. This is the recommended way to practise all state operations safely before you ever point them at production.

Common mistakes & troubleshooting

Symptom Likely cause Fix
state mv 'aws_instance.web[0]' ... “does nothing” or errors Shell ate the brackets/quotes Single-quote the entire address: 'aws_instance.web[0]'.
Plan wants to create a resource that already exists Resource is not in state (lost state, never imported, or state rm’d) import {} block the existing resource; confirm “1 to import, 0 to change”.
Error acquiring the state lock on every run Stale lock from a killed apply Confirm no live writer, then terraform force-unlock <ID> from the error.
terraform apply reverted a legitimate console hotfix Ran a bare apply on drift instead of inspecting Use plan -refresh-only first; apply -refresh-only to adopt, plain apply to revert — decide per attribute.
Error: state snapshot was created by Terraform vX.Y, newer than current A teammate wrote state with a newer CLI Upgrade your CLI to match (tfenv/tfswitch); never downgrade state by hand.
Secret found in terraform.tfstate despite sensitive = true sensitive only redacts output, never encrypts state Encrypt the backend, restrict access, never commit state; OpenTofu state encryption for client-side.
state push rejected: “serial is older” / lineage mismatch Pushing a stale or foreign state file Bump serial above the backend’s current value (after reconciling); never -force across a lineage boundary — see surgery lesson.
Plan is slow / hits provider rate limits on a big estate Full refresh of a huge state on every plan Split state along lifecycle seams (scale lesson); use -refresh=false for quick iteration when you know nothing drifted.

Best practices

Security notes

State is a high-value secrets store; the controls are non-negotiable.

Interview & exam questions

1. What is Terraform state and why can it not be regenerated from .tf files? State is the mapping from your declared resources to their real-world IDs, plus metadata (dependencies) and a cache of last-known attributes. The configuration is reproducible, but the binding between aws_vpc.main and vpc-0a1b2c3d is not — only state holds it, which is why losing state means Terraform proposes to recreate everything.

2. What are the three jobs state does? Mapping (config address → real ID, the core job), metadata (dependencies and teardown ordering), and performance (caching attributes so plan need not query every resource every time).

3. Difference between version, terraform_version, serial, and lineage in the state file? version is the state format version (4). terraform_version is the binary that last wrote it. serial is a per-write counter used for optimistic locking (a lower serial overwriting a higher one means lost writes). lineage is a UUID identifying one state’s history — different lineages are not versions of each other and pushing across them causes split-brain.

4. Does sensitive = true encrypt the value in state? No — and this is the classic trap. sensitive only redacts the value from CLI/plan output. The value is stored in state in plaintext. Protect it via backend encryption, access control, and (in OpenTofu) client-side state encryption.

5. Walk through the terraform state subcommands. list (addresses), show (one resource’s attributes), mv (rename/move without touching the resource), rm (forget without destroying), pull (dump raw JSON, for backup), push (upload a local file — most dangerous), replace-provider (rewrite provider source addresses). list/show/pull are read-only; the rest mutate and should be preceded by a backup.

6. How does state locking work, and which backends provide it? Before a write, Terraform acquires an exclusive lock from the backend; a second run waits (-lock-timeout) or fails. S3 uses a native .tflock (1.10+) or a DynamoDB LockID table; Azure uses a blob lease; GCS uses a .tflock; HCP/TFC manages it. A backend without locking can be corrupted by concurrent applies.

7. What does force-unlock do and what is the danger? It removes a lock entry by ID without verifying the holder is gone. If a real apply is still running, breaking its lock creates two concurrent writers and corrupts state. Always confirm the holding process is dead first; never automate it in a retry loop.

8. terraform state rm vs terraform destroy — what is the difference? destroy deletes the real infrastructure and removes it from state. state rm removes the resource from state only, leaving the real infrastructure running (Terraform simply forgets it). Use rm to hand a resource to another state or drop a phantom entry.

9. Why is terraform refresh deprecated, and what replaces it? The standalone command writes refreshed state with no review — it is effectively apply -refresh-only -auto-approve. Use terraform plan -refresh-only to see drift and apply -refresh-only to adopt it, so the reconciliation is reviewable.

10. You renamed a resource and now plan wants to destroy the old one and create a new one. What happened and how do you fix it without downtime? The address changed, so Terraform sees the old address as gone and the new one as new. Fix it by telling Terraform it is the same object: add a moved { from = old to = new } block (preferred, reviewable) or run terraform state mv old new. Then plan shows no changes.

11. How do you bring an existing, manually-created resource under Terraform management? Import it. Write an import { to = <address> id = <real-id> } block plus a matching resource block, plan to confirm “1 to import, 0 to change”, then apply. Prefer the block over the legacy terraform import CLI because it is plan-reviewable and can generate config.

12. Where, besides the state file, can secrets leak — and how do you stop it? Saved plan files (*.tfplan) contain values unencrypted; CI logs can spill via error messages or output misuse; terraform_remote_state exposes every producer output to any consumer; a committed .tfstate is a Git-history leak. Mitigate with backend encryption + RBAC, secret plan-file handling, no secrets in remote-state outputs, .gitignore for state/plans, and OpenTofu client-side encryption.

Quick check

  1. True or false: marking an output sensitive = true encrypts it inside the state file.
  2. Which terraform state subcommand forgets a resource without destroying the real infrastructure?
  3. In the address module.net.aws_subnet.private["az-a"], which part tells you the resource uses for_each rather than count?
  4. What is the danger of running terraform force-unlock without checking first?
  5. Which command safely shows you drift without proposing to revert it?

Answers

  1. False. sensitive only redacts CLI/plan output; the value is stored in state in plaintext. Protect it with backend encryption, access control, or OpenTofu client-side state encryption.
  2. terraform state rm — it removes the resource from state only; the real resource keeps running.
  3. The ["az-a"] string key. for_each instances are keyed by string (["az-a"]); count instances are keyed by integer ([0]).
  4. It removes the lock without confirming the holder is dead, so if a real apply is still running you get two concurrent writers and corrupted state.
  5. terraform plan -refresh-only — it reconciles state against reality and reports drift without proposing to change any resource.

Exercise

Starting from the lab project:

  1. Add an aws-free second resource using for_each over a map (e.g. another local_file keyed by { api = "8080", ui = "3000" }) so you have a richer state to operate on. Apply.
  2. Convert one of your existing for_each resources to a different key set (drop one key, add one) and observe in the plan how for_each adds and destroys only the changed keys, leaving others untouched — contrast this mentally with how count would have shuffled every index.
  3. Back up state (terraform state pull > backup.tfstate), then use terraform state mv to re-key one instance to a new for_each key, update the config to match, and prove with terraform plan that there are now no changes — demonstrating a non-destructive re-key.
  4. Use terraform state pull | jq to locate the plaintext value of your random_password and write one sentence on exactly which backend control you would add to protect it in production.
  5. Run terraform plan -detailed-exitcode and record the exit code; explain what 0 versus 2 would each mean. Finish with terraform destroy.

Write two or three sentences on the difference you observed between state mv (imperative, CLI) and a moved {} block (declarative, in-config) for the re-key — this is a common interview discriminator.

Certification mapping

This lesson maps to the HashiCorp Certified: Terraform Associate (003) exam, principally the objective “Implement and maintain state”: the purpose of state, local vs remote state and backends, state locking and force-unlock, sensitive data in state and how to protect it, and the terraform state subcommands (list, show, mv, rm, pull, push, replace-provider). It also touches “Use Terraform outside the core workflow” (import, the import {} block, state surgery, -replace superseding taint) and “Read, generate, and modify configuration” (resource addressing, moved/removed/import blocks). Expect several questions that hinge on the two facts this lesson hammers: sensitive does not encrypt state, and state rm does not destroy the resource. The companion Terraform Associate Prep Kit drills these as practice questions.

Glossary

Next steps

You can now read a state file, reach for the right terraform state subcommand without guessing, address any resource precisely, reason about locking, and harden a backend against the plaintext-secrets problem. The next move is to stop running raw terraform per environment and let a thin wrapper generate your backend, providers, and remote-state wiring for you. Continue with Terragrunt Configuration, In Depth, which dissects every block, function, and hook in terragrunt.hcl — including the remote_state and generate blocks that turn the backend boilerplate you configured by hand here into one DRY definition across every stack. For the operational and recovery sides of state, go deeper with Remote State at Scale (splitting, cross-stack sharing, org guardrails) and State Surgery (corruption, split-brain, and rebuilding lost state).

TerraformStateState LockingSensitive DataOpenTofuBackend
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments