Building a Platform Layer with Azure Verified Modules and Terraform

Most teams that adopt Azure Verified Modules (AVM) stop at “I called a module and got a resource.” That is the demo, not the value. The real win is using AVM as the substrate for an opinionated platform layer your application teams consume without ever touching a raw azurerm_ resource — a layer that bakes in private endpoints, diagnostic settings, mandatory tags and naming as types, validated at plan, so a publicly-exposed Key Vault is not something an app team can ship by accident because the lever does not exist. This guide builds that layer end to end: composing AVM resource modules into your own pattern modules, pinning them sanely (the pre-1.0 ~> trap that nukes 40 data planes in one Renovate merge), testing them at two altitudes, and shipping them through a private registry — with the state-migration discipline that keeps every upgrade a zero-destroy event.

The thing that makes this hard is not Terraform syntax; it is that AVM modules are pre-1.0, so the version-constraint intuition every engineer carries (~> 0.9 is “patches only”) is exactly wrong, and the place that intuition fails is the shared wrapper that 40 repos depend on. The thing that makes it worth doing is that once the wrapper exists, your platform team upgrades the entire estate by merging one pinned-version PR with a reviewed plan diff, and your app teams ship spokes, vaults and storage that are private, tagged and observable by construction. This article is the reference you keep open while you build that: every interface input, every pin rule, every test layer, every migration block, laid out as scannable tables so you read the prose once and then work from the tables.

By the end you will stop treating AVM as a fancier resource and start treating it as the brick library underneath your org’s non-negotiables. You will know precisely which version constraint pins a 0.x module without admitting a breaking minor, which inputs to expose and which to weld shut, how to assert a wrapper’s shape at plan without deploying, how to run a real apply/destroy against an ephemeral subscription with keyless OIDC, how to publish to a private registry on a semver tag, and — the part that separates a senior platform engineer from someone who read the README — how to absorb AVM’s internal resource-address churn inside the wrapper so a minor bump never shows up as destroy/create in a consumer’s plan.

What problem this solves

The pain this solves is module sprawl plus silent drift from your standards. Before a platform layer, every app team copies a different community module (or hand-rolls azurerm_ resources), each with different input names, none of which reliably support diagnostic settings, locks, role assignments or private endpoints. Security finds a publicly-accessible storage account in a quarterly review; the team that built it points at a wiki page nobody read. “Please tag things” and “please use private endpoints” live as documentation, which means they live as suggestions. Multiply by 40 repos and you have an estate you cannot reason about, where the answer to “is everything private?” is “let me go check, repo by repo.”

What breaks without this layer: governance becomes archaeology. You cannot upgrade a defaulting convention across the estate because there is no single place that owns it. A new compliance rule (force-Entra-auth on storage, deny public Key Vault) becomes 40 pull requests against 40 inconsistent codebases instead of one version bump. And when you do try to standardise by swapping hand-rolled modules for AVM, the naive attempt shows every storage account scheduled for destroy/create in the plan — because the resource address changed — so the migration gets reverted and “AVM doesn’t work for us” enters the team’s folklore.

Who hits this: every platform / cloud-engineering team operating Terraform at more than a handful of repos, especially under a landing-zone program where the Azure Cloud Adoption Framework landing zones defines the guardrails but leaves how app teams provision inside a spoke to you. It bites hardest where pre-1.0 AVM modules are pinned with ~> (the breaking-minor trap), where wrappers are accidental passthroughs of the full AVM surface (no guardrail value), and where nobody reads the migration plan before merging a Renovate AVM bump.

To frame the whole field before the deep dive, here is every layer of the module supply chain this article builds, who owns it, and the single failure that bites at each:

Layer	What it is	Who owns it	The failure that bites here
Upstream AVM resource module (`avm-res-*`)	One logical resource + its children, WAF-aligned	Microsoft	Pre-1.0: `~> 0.x` admits a breaking minor
Upstream AVM pattern module (`avm-ptn-*`)	A multi-resource architecture (hub-spoke, LZ)	Microsoft	Heavier blast radius on a bad bump
Your platform wrapper	Org pattern composed from AVM bricks + injected policy	Platform team	Accidental passthrough = no guardrail
Test + release gate	`terraform test` + Terratest + replace-gate in CI	Platform team	Unacknowledged `destroy`/`create` ships
Private registry / git ref	Versioned, semver-tagged distribution	Platform team	Copy-paste instead of `source`/`version`
App-team consumption	Narrow inputs only; deploy into Azure	App teams	A leaked lever lets them go public

Learning objectives

By the end of this article you can:

Distinguish AVM resource modules from pattern modules, and place your wrapper as a deliberate third tier that composes resource bricks and injects org policy — without forking AVM.
Read an AVM module’s real interface (tags, lock, role_assignments, diagnostic_settings, private_endpoints, managed_identities, enable_telemetry) instead of guessing from prose.
Pin AVM dependencies correctly for pre-1.0 modules — why ~> 0.9 is dangerous, why ~> 0.9.1 is right in a wrapper, and why exact pins belong in the platform layer while ~> X.Y.Z belongs in app repos.
Compose AVM resource modules into a pattern wrapper that forces private endpoints, diagnostics and tags as non-negotiable inputs, with validation blocks that turn conventions into hard plan-time failures.
Test wrappers at two altitudes — fast terraform test plan-level contract assertions, and nightly Terratest against an ephemeral subscription with OIDC keyless auth.
Publish wrappers to a private registry (Terraform Cloud/Enterprise) or a versioned git ref (Azure DevOps), and consume them by source/version with a semver contract.
Migrate hand-rolled modules to AVM without state churn using moved and import blocks, and gate CI so an unacknowledged destroy/create can never merge.
Automate AVM upgrades with Renovate so each bump is one reviewable PR carrying a terraform plan diff.

Prerequisites & where this fits

You should be comfortable with core Terraform: HCL, providers, the init/plan/apply workflow, modules with inputs and outputs, and remote state. If any of that is shaky, the Terraform fundamentals: HCL, providers, state & workflow and Terraform state deep dive come first. Module authoring conventions — inputs, outputs, versioning — are assumed from Authoring Terraform modules: structure, inputs, outputs, versioning. You should know what a version constraint means in principle (we will sharpen it for 0.x), and have an Azure subscription plus the azurerm provider configured.

This sits at the infrastructure-as-code / platform-engineering layer of an Azure estate. It assumes the landing-zone scaffolding above it — management groups, policy, the hub — from Azure Cloud Adoption Framework landing zones, and it produces the spokes app teams deploy into. It pairs with Terraform module design: composition, versioning (the composition theory), Terraform testing: native & Terratest (the test mechanics), and Terraform refactoring: moved, import & removed blocks (the migration mechanics this article applies to AVM specifically). For teams that prefer Bicep, the equivalent distribution story is Bicep private module registry with ACR & CI/CD.

A quick map of who confirms what when something goes wrong, so you route a problem to the right layer fast:

Concern	Where it lives	Confirm with	Owns the fix
“Which AVM version actually resolved?”	`.terraform.lock.hcl`	`terraform providers lock` / read the lock	Platform team
“Why is the plan showing a replace?”	Wrapper resource addresses	`terraform show -json` + `jq`	Platform team
“Why did the plan error on a deployment?”	`enable_telemetry` in a locked sub	Plan error text	Platform + governance
“Why can this team go public?”	Wrapper `variables.tf` surface	grep for the exposed lever	Platform team
“Is the published version right?”	Registry / git tag	`terraform init` in a consumer	Platform + app team
“Did the migration churn state?”	Plan actions on adopt	replace-gate in CI	Platform team

Core concepts

Five mental models make every later decision obvious.

AVM is a specification, not just a module set. The reason AVM is worth building on is not “Microsoft published modules” — it is that every module conforms to the same interface contract. Consistent input names, mandatory support for diagnostic settings, locks, role assignments, and (where the service supports them) private endpoints, plus Well-Architected (WAF) defaults rather than the bare minimum that compiles. You learn one shape and it generalises across services. That shared shape is what lets you write generic org policy (force diagnostics everywhere) instead of bespoke wiring per resource.

There are two AVM module classes, and your wrapper is a third tier. AVM ships resource modules (Azure/avm-res-<service>-<resource>/azurerm) — one logical resource plus its directly-dependent children — and pattern modules (Azure/avm-ptn-<pattern>/azurerm) — a whole multi-resource architecture. The mental model: resource modules are LEGO bricks; pattern modules are pre-built assemblies. Your platform layer is neither — it is a third tier: your own pattern modules, composed from AVM resource bricks, that encode your org’s non-negotiables. You generally do not fork AVM; you wrap it.

Pre-1.0 changes the meaning of ~>. This is the single most consequential fact in the article. AVM resource modules are below 1.0, and AVM treats the minor segment as the breaking-change segment while below 1.0. So ~> 0.9 (which feels like “0.9.x only”) actually expands to >= 0.9.0, < 1.0.0 and will happily pull a breaking 0.10.0. The constraint that pins to a non-breaking range is the three-part ~> 0.9.1 (allows 0.9.1 .. 0.9.x, blocks 0.10.0). If you remember one thing, remember this.

The wrapper’s value is what it does not expose. A platform module is valuable in proportion to the levers it removes. If your variables.tf mirrors the AVM module’s inputs, you have built a passthrough, not a platform — an app team can still ship a public Key Vault. The discipline is to expose a narrow contract (workload name, tags, the central LAW id) and inject the rest (public_network_access_enabled = false, enable_telemetry = false, forced diagnostics and private endpoints) as constants the caller cannot override. Guardrails as types, validated at plan, not as a wiki page.

Every AVM upgrade is a potential state migration. Exact version pins control when you take an upgrade, not whether it is safe. A minor bump can move a resource under a for_each map, changing its resource address — and a changed address means Terraform plans destroy + create, which on a storage account is a data-plane deletion. The senior move is to read every AVM bump as a possible state migration, absorb the address change inside the wrapper with a moved block shipped in the same version, and gate CI so an unacknowledged replace can never merge.

The vocabulary in one table

Pin down every moving part before the deep sections; the glossary repeats these for lookup, this is the mental model side by side:

Concept	One-line definition	Where it lives	Why it matters
Resource module (`avm-res-*`)	One logical resource + children	Public Terraform registry	The brick you compose
Pattern module (`avm-ptn-*`)	A multi-resource architecture	Public registry	A pre-built assembly
Platform wrapper	Your pattern over AVM bricks	Your private registry / repo	Encodes org non-negotiables
AVM interface	The shared optional input contract	Each module’s `variables.tf`	Lets you write generic policy
`enable_telemetry`	Empty ARM deployment for usage metrics	An AVM input (default `true`)	Fails plan in locked subs
`~> 0.9.1` vs `~> 0.9`	Three-part vs two-part 0.x pin	Module `version` arg	One blocks breaking minors, one doesn’t
`validation` block	Custom input precondition	Wrapper `variables.tf`	Turns conventions into plan failures
`terraform test`	Native plan/apply assertion runner	`tests/*.tftest.hcl`	Fast contract checks, no deploy
Terratest	Go E2E apply/assert/destroy	`test/*.go`	Real Azure validation, nightly
`moved` block	Declares old→new resource address	Wrapper `.tf`	Absorbs AVM address churn
`import` block	Brings existing Azure into state	Wrapper / consumer `.tf`	Brownfield adoption, no recreate
Replace gate	CI check rejecting `destroy`+`create`	Pipeline step	Stops accidental data-plane loss

Why AVM exists: the resource vs. pattern split

AVM is Microsoft’s effort to replace the sprawl of inconsistent community modules with a single, owned, specification-driven set. Two things make it worth building on. A shared specification: every module conforms to the same interface contracts — consistent input names, mandatory support for diagnostic settings, locks, role assignments, and (where relevant) private endpoints; you learn one shape and it generalises. WAF alignment: modules encode Well-Architected defaults rather than the bare minimum that compiles. The two module classes you actually compose with are the resource and pattern modules — and your platform layer is a third tier over them.

Class	Terraform registry prefix	Scope	When you reach for it
Resource module	`Azure/avm-res-<service>-<resource>/azurerm`	One logical resource + its directly dependent child resources	The brick for your own pattern
Pattern module	`Azure/avm-ptn-<pattern>/azurerm`	A multi-resource architecture (hub-spoke, AKS landing zone)	A whole assembly you accept as-is
Platform wrapper (yours)	your private registry / git ref	Org pattern composed from AVM resource bricks + injected policy	What app teams actually consume

The mental model: resource modules are LEGO bricks; pattern modules are pre-built assemblies. Your platform layer is a third tier — your own pattern modules, composed from AVM resource bricks, that encode your org’s non-negotiables. You generally do not fork AVM; you wrap it.

The decision of which tier to consume, by situation:

If you need…	Consume	Why
A single Key Vault with org defaults	Resource module, wrapped	You inject policy the bare brick doesn’t enforce
A whole hub-spoke exactly as Microsoft ships it	Pattern module directly	No org-specific deltas; accept the assembly
A spoke with your naming, tags, PE, diagnostics	Your wrapper over resource bricks	The pattern module won’t encode your non-negotiables
A one-off experiment / spike	Resource module directly	Not worth a wrapper yet
To change a default across 40 repos	Your wrapper (one bump)	The only place that owns the convention

Why build on AVM at all rather than community modules or raw resources — the three approaches side by side on the axes that matter at estate scale:

Axis	Raw `azurerm_` resources	Community modules	AVM (wrapped)
Interface consistency	None (you write it all)	Varies wildly per author	Mandated, identical shape across services
Diagnostics / locks / PE support	Hand-wired each time	Sometimes, inconsistently	First-class, standard inputs
Defaults	Whatever you type	Author’s opinion	WAF-aligned (good baseline)
Ownership / maintenance	You own everything	Author may abandon it	Microsoft-owned, supported
Upstream fixes	N/A	If the author ships them	Flow to you (you compose, not fork)
Org policy injection	Manual, per resource	Fork or pray	Inject once in your wrapper tier
Estate-wide change	N PRs, N codebases	N PRs	One wrapper bump

A bare resource-module call looks like this — the starting point you will deliberately narrow and harden in your wrapper:

module "kv" {
  source  = "Azure/avm-res-keyvault-vault/azurerm"
  version = "0.9.1"

  name                = "kv-platform-eus-01"
  resource_group_name = azurerm_resource_group.platform.name
  location            = "eastus"
  tenant_id           = data.azurerm_client_config.current.tenant_id
}

That call gets you a vault, but with AVM’s defaults and the full AVM surface exposed — neither of which is what you ship to app teams. The whole rest of this article is turning that into a guarded, distributed, upgrade-safe platform brick.

Reading an AVM module’s interface

Before wrapping anything, read the interface — not the README prose, the actual variables. Because the AVM spec mandates a shared shape, resource modules share a recognisable set of optional inputs beyond the resource-specific ones. Know this set cold; it is the surface you decide to expose, inject or forbid in your wrapper.

AVM input	Type (shape)	What it does	Your wrapper’s stance
`tags`	`map(string)`	Tags applied to the resource	Expose (validated for mandatory keys)
`lock`	object	Apply `CanNotDelete` / `ReadOnly` management lock	Inject (org default) or expose narrowly
`role_assignments`	`map(object)`	RBAC assignments, keyed for add/remove without reindexing	Inject baseline; optionally extend
`diagnostic_settings`	`map(object)`	Log/metric categories → workspace/storage/Event Hub	Inject (non-negotiable → central LAW)
`private_endpoints`	`map(object)`	PE definitions (subnet, private DNS zone group)	Inject (non-negotiable on PE-capable services)
`managed_identities`	object	System- and/or user-assigned identity wiring	Inject or expose per pattern
`enable_telemetry`	`bool` (default `true`)	Tiny empty ARM deployment for usage metrics	Inject `false` org-wide
`<resource>-specific`	varies	e.g. `public_network_access_enabled`, `sku_name`	Mostly forbid; expose only the safe ones

That enable_telemetry row deserves a callout because it fails in a way that wastes an afternoon:

enable_telemetry: AVM modules deploy a tiny, empty ARM deployment whose name encodes the module and version. It sends no resource data to Microsoft — it lets the team measure module usage. It is harmless, but in locked-down subscriptions where Microsoft.Resources/deployments is policy-denied, it will fail a plan with a confusing error. Decide your org default once (we set it false and bake that into our wrappers) rather than per-call.

Inspect the real inputs instead of guessing — pull the module and read its variables directly:

terraform init
terraform providers schema -json > /dev/null   # sanity-check provider wiring
# Read the module's own variables directly:
find .terraform/modules/kv -name 'variables.tf' -exec grep -E '^variable' {} +

The AVM-standard inputs and their direct-resource equivalents, so you know what the brick is wiring under the hood:

AVM input	Underlying `azurerm` mechanism it wraps	Why the wrapper is nicer
`diagnostic_settings`	`azurerm_monitor_diagnostic_setting`	One map vs N resource blocks + category enumeration
`private_endpoints`	`azurerm_private_endpoint` + DNS zone group	Subnet + zone wiring abstracted, keyed
`role_assignments`	`azurerm_role_assignment`	Keyed map survives reordering; no index churn
`lock`	`azurerm_management_lock`	Single object, attached to the resource scope
`managed_identities`	`identity {}` block + `azurerm_user_assigned_identity`	System/user identity wiring normalised

Pinning and dependency strategy

AVM resource modules are pre-1.0, and this breaks the intuition most people have about ~>. The constraint that feels safe is the one that bites.

# DANGEROUS for a 0.x module:
version = "~> 0.9"   # allows 0.9.x AND 0.10.0, 0.11.0, ...

For 0.x releases, ~> 0.9 is equivalent to >= 0.9.0, < 1.0.0. Because AVM treats the minor segment as the breaking-change segment while below 1.0, that constraint happily pulls in a breaking 0.10.0. The constraint that actually pins to a non-breaking range is the three-part form:

# Allows 0.9.1 .. 0.9.x, blocks 0.10.0:
version = "~> 0.9.1"

The full constraint-operator behaviour, made explicit so you never guess what a given string admits:

Constraint written	Expands to	Admits a breaking 0.10.0?	Verdict for a 0.x AVM module
`0.9.1`	exactly `0.9.1`	No	Best in a wrapper — deliberate, reviewed bumps
`= 0.9.1`	exactly `0.9.1`	No	Same as above, explicit form
`~> 0.9.1`	`>= 0.9.1, < 0.10.0`	No	Good — allows safe patch drift inside 0.9.x
`~> 0.9`	`>= 0.9.0, < 1.0.0`	Yes	Dangerous — the classic AVM mistake
`>= 0.9.0`	`>= 0.9.0` (unbounded)	Yes (and beyond)	Never — unbounded, will break
`>= 0.9.0, < 0.10.0`	that range	No	Verbose but correct equivalent of `~> 0.9.1`
(omitted)	latest available	Yes	Never in shared code — irreproducible

My rule across the platform repo, and why each tier pins differently:

Repo tier	Pin AVM dependencies as	Pin your wrapper as	Rationale
Wrapper (platform) modules	exact (`version = "0.9.1"`)	n/a (this is the wrapper)	The platform layer is where you absorb upgrade risk deliberately, in a PR, with a reviewed plan diff
Consuming (app) repos	inherited from the wrapper	`~> X.Y.Z` on your wrapper	Your wrappers are semver-disciplined, so `~>` is safe here; app teams inherit the AVM versions you chose

Automate the bumps with Renovate so you review upgrades instead of chasing them. Renovate understands Terraform registry sources natively:

{
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
  "extends": ["config:recommended"],
  "terraform": { "enabled": true },
  "packageRules": [
    {
      "matchManagers": ["terraform"],
      "matchPackageNames": ["/^Azure/avm-/"],
      "groupName": "azure-verified-modules",
      "schedule": ["before 9am on monday"]
    }
  ]
}

Each Renovate PR becomes a single reviewable unit: the version bump plus the terraform plan your CI attaches as a comment. The lock file is what makes any of this reproducible — what each artifact pins and where:

Artifact	Pins	Committed?	Bumped by
`version =` in `module` block	Module version constraint	Yes (in code)	Renovate PR / manual
`.terraform.lock.hcl`	Provider versions + checksums	Yes (always commit)	`terraform init -upgrade`
`required_version` (`versions.tf`)	Terraform CLI version range	Yes	Manual, deliberate
`required_providers` (`versions.tf`)	Provider source + version range	Yes	Manual / Renovate

Wrapping resource modules into pattern modules

Here is the core of the platform layer. We want app teams to ask for “a spoke” and get a VNet, a Key Vault, and a storage account — all with private endpoints, diagnostics, and tags already correct. They should not be able to opt out of those. The directory layout that scales:

platform-modules/
└── spoke-landing-zone/
    ├── main.tf          # composes AVM resource modules
    ├── variables.tf     # the narrow contract app teams see
    ├── outputs.tf
    ├── versions.tf      # required_providers + required_version
    └── tests/
        └── defaults.tftest.hcl

What each file owns, and the rule that keeps the wrapper a platform and not a passthrough:

File	Owns	The discipline
`main.tf`	Composition of AVM bricks + injected policy	Inject `enable_telemetry`, `diagnostic_settings`, `private_endpoints` here — never pass them through
`variables.tf`	The narrow caller contract	Expose only safe inputs; `validation` on naming + mandatory tags
`outputs.tf`	Stable outputs (ids, URIs)	Treat as API: renaming an output is a major version bump
`versions.tf`	`required_version` + `required_providers`	Pin the CLI and provider ranges deliberately
`tests/*.tftest.hcl`	Plan-level contract assertions	Assert the locked-down defaults resolve as expected

The wrapper’s main.tf composes AVM bricks and injects org policy. Note enable_telemetry, diagnostic_settings, and private_endpoints are set by us, not passed through from the caller:

locals {
  base_tags = merge(var.tags, {
    managedBy = "platform-team"
    module    = "spoke-landing-zone"
  })
}

module "vnet" {
  source  = "Azure/avm-res-network-virtualnetwork/azurerm"
  version = "0.8.1"

  name                = "vnet-${var.workload}-${var.location_short}"
  resource_group_name = var.resource_group_name
  location            = var.location
  address_space       = var.address_space
  tags                = local.base_tags
  enable_telemetry    = false

  subnets = {
    pe = {
      name             = "snet-private-endpoints"
      address_prefixes = [var.pe_subnet_prefix]
    }
  }
}

module "kv" {
  source  = "Azure/avm-res-keyvault-vault/azurerm"
  version = "0.9.1"

  name                = "kv-${var.workload}-${var.location_short}"
  resource_group_name = var.resource_group_name
  location            = var.location
  tenant_id           = var.tenant_id
  tags                = local.base_tags
  enable_telemetry    = false

  # Org default: no public access, ever.
  public_network_access_enabled = false

  diagnostic_settings = {
    central = {
      name                  = "to-law"
      workspace_resource_id = var.log_analytics_workspace_id
    }
  }

  private_endpoints = {
    vault = {
      subnet_resource_id            = module.vnet.subnets["pe"].resource_id
      private_dns_zone_resource_ids = [var.kv_private_dns_zone_id]
    }
  }
}

module "sa" {
  source  = "Azure/avm-res-storage-storageaccount/azurerm"
  version = "0.6.4"

  name                = "st${var.workload}${var.location_short}"
  resource_group_name = var.resource_group_name
  location            = var.location
  tags                = local.base_tags
  enable_telemetry    = false

  public_network_access_enabled = false
  shared_access_key_enabled     = false   # force Entra auth

  diagnostic_settings = {
    central = {
      name                  = "to-law"
      workspace_resource_id = var.log_analytics_workspace_id
    }
  }
}

The version numbers above are illustrative pins from the time of writing. Resolve the current ones for your repo from the registry and pin them exactly — never copy version strings from a blog post into production. (Yes, including this one.)

The naming convention the wrapper encodes (so app teams never hand-name a resource), with the Azure abbreviation and a worked example:

Resource	Pattern in the wrapper	Azure abbrev.	Example (`workload=checkout`, `eus`)	Constraint to respect
Resource group	`rg-${workload}-${loc}`	`rg`	`rg-checkout-eus`	≤ 90 chars
Virtual network	`vnet-${workload}-${loc}`	`vnet`	`vnet-checkout-eus`	≤ 64 chars
Subnet (PE)	`snet-private-endpoints`	`snet`	`snet-private-endpoints`	≤ 80 chars
Key Vault	`kv-${workload}-${loc}`	`kv`	`kv-checkout-eus`	3–24, globally unique
Storage account	`st${workload}${loc}`	`st`	`stcheckouteus`	3–24, lowercase alnum only
Private endpoint	`pe-${resource}-${workload}`	`pe`	`pe-kv-checkout`	≤ 80 chars
Log Analytics ws	`law-${scope}`	`law`	`law-central`	≤ 63 chars

Note the storage-account row is why the wrapper drops the hyphen and lowercases — storage names reject hyphens and uppercase, so encoding the rule in the module stops a whole class of plan-time naming failures.

The three bricks this wrapper composes, and the policy injected onto each — the table app teams never see but every reviewer should:

Brick (AVM resource module)	Pinned	Injected non-negotiable	What it would default to bare
`avm-res-network-virtualnetwork`	`0.8.1`	PE subnet pre-created; telemetry off	No PE subnet; telemetry on
`avm-res-keyvault-vault`	`0.9.1`	`public_network_access_enabled=false`; PE + diag forced	Public access allowed; no PE/diag wired
`avm-res-storage-storageaccount`	`0.6.4`	`shared_access_key_enabled=false`; public off; diag forced	Key auth on; public allowed

Why these specific defaults are the non-negotiables, in plain risk terms:

Injected default	Risk it removes	Equivalent Azure Policy (defence in depth)
`public_network_access_enabled = false` (KV)	Vault reachable from the internet	Deny public network access on Key Vault
`private_endpoints = { vault = … }`	Secrets traffic leaving the backbone	Audit/deny resources without a PE
`shared_access_key_enabled = false` (SA)	Long-lived account keys to steal	Deny storage account key access
`diagnostic_settings → central LAW`	Blind spot — no audit trail	Deploy-if-not-exists diagnostic settings
`enable_telemetry = false`	Plan failure in locked subs	(operational, not security)

Enforcing org defaults as non-negotiable inputs

The discipline that makes a platform layer valuable is what the wrapper does not expose. Compare the AVM surface (dozens of inputs) to your variables.tf:

variable "workload" {
  type        = string
  description = "Short workload name, used in resource naming."
  validation {
    condition     = can(regex("^[a-z0-9]{2,12}$", var.workload))
    error_message = "workload must be 2-12 lowercase alphanumeric chars."
  }
}

variable "tags" {
  type        = map(string)
  description = "Caller tags; merged with mandatory platform tags."
  validation {
    condition     = contains(keys(var.tags), "costCenter") && contains(keys(var.tags), "owner")
    error_message = "tags must include costCenter and owner."
  }
}

variable "log_analytics_workspace_id" {
  type        = string
  description = "Central LAW resource ID for diagnostic settings."
}
# ... resource_group_name, location, location_short, tenant_id,
#     address_space, pe_subnet_prefix, kv_private_dns_zone_id

There is no public_network_access_enabled, no enable_telemetry, no way to skip diagnostics. App teams cannot ship a publicly exposed Key Vault through this module because the lever does not exist. That is the entire point — guardrails as types, validated at plan, not as a wiki page nobody reads. The validation blocks turn “please remember to tag things” into a hard failure.

The full contract — every input the wrapper exposes, its type, whether it is validated, and why it is safe to expose:

Input	Type	Validated?	Why it’s safe to expose
`workload`	`string`	regex `^[a-z0-9]{2,12}$`	Drives naming only; bounded charset
`tags`	`map(string)`	must contain `costCenter`, `owner`	Merged with platform tags; can’t drop mandatory keys
`location`	`string`	(optional: allow-list of regions)	Placement choice, not a security lever
`location_short`	`string`	(optional: regex)	Naming suffix
`resource_group_name`	`string`	—	Where it lands; caller owns the RG
`tenant_id`	`string`	—	Required by KV; not a guardrail
`address_space`	`list(string)`	(optional: CIDR check)	IPAM choice, governed upstream
`pe_subnet_prefix`	`string`	(optional: CIDR check)	Must fit inside `address_space`
`log_analytics_workspace_id`	`string`	—	Forces diagnostics to your LAW
`kv_private_dns_zone_id`	`string`	—	The PE zone; injecting PE needs it

And the inputs the wrapper deliberately forbids (does not expose), with what each would let an app team do if leaked:

Forbidden lever	What leaking it would allow	Kept as
`public_network_access_enabled`	Ship a public KV / storage account	Hard-coded `false` in `main.tf`
`shared_access_key_enabled` (SA)	Re-enable stealable account keys	Hard-coded `false`
`enable_telemetry`	Break plans in locked subs by accident	Hard-coded `false`
`diagnostic_settings`	Skip the audit trail / point elsewhere	Injected → central LAW
`private_endpoints`	Deploy without a PE	Injected from the wrapper’s PE subnet
`role_assignments` (raw)	Grant arbitrary RBAC inline	Baseline injected; extensions reviewed

The validation patterns worth standardising, with the message your colleague sees at plan:

Validate	Condition (sketch)	Error message
Workload name shape	`can(regex("^[a-z0-9]{2,12}$", var.workload))`	“workload must be 2-12 lowercase alphanumeric chars.”
Mandatory tags present	`contains(keys(var.tags), "costCenter") && …`	“tags must include costCenter and owner.”
Region allow-list	`contains(["eastus","centralindia"], var.location)`	“location must be an approved region.”
PE subnet inside VNet	`cidrhost(var.address_space[0], 0) != ""` (+ range check)	“pe_subnet_prefix must fall inside address_space.”
Env name enum	`contains(["dev","test","prod"], var.environment)`	“environment must be dev, test, or prod.”

Testing modules: `terraform test` and Terratest

Two layers, two tools — and they answer different questions. Native terraform test answers “does the wrapper produce the right shape?” cheaply and without deploying; Terratest answers “does it actually work in Azure?” expensively and occasionally.

Dimension	`terraform test` (native)	Terratest (Go)
Altitude	`plan`-level (also `apply` if you ask)	Real `apply` against live Azure
Speed	Seconds	Minutes (deploy + destroy)
Cost	Free (no resources)	Real Azure spend on an ephemeral sub
Deploys resources?	No (for `command = plan`)	Yes — apply then destroy
Best for	Contract / shape assertions, guardrail proofs	End-to-end behaviour, real PE/DNS resolution
Runs in CI…	On every push / PR	Nightly / pre-release
Language	HCL	Go
Failure means	Wrapper composed the wrong shape	Azure rejected or behaviour drifted

Native terraform test is fast, runs in-process, and is perfect for plan-level contract assertions — “does the wrapper produce the right shape?” No deployment needed:

# tests/defaults.tftest.hcl
run "defaults_are_locked_down" {
  command = plan

  variables {
    workload                   = "checkout"
    location                   = "eastus"
    location_short             = "eus"
    resource_group_name        = "rg-checkout"
    tenant_id                  = "00000000-0000-0000-0000-000000000000"
    address_space              = ["10.20.0.0/24"]
    pe_subnet_prefix           = "10.20.0.0/27"
    log_analytics_workspace_id = "/subscriptions/.../workspaces/law-central"
    kv_private_dns_zone_id     = "/subscriptions/.../privateDnsZones/privatelink.vaultcore.azure.net"
    tags                       = { costCenter = "1234", owner = "team@contoso.com" }
  }

  assert {
    condition     = module.kv.... == false  # assert the resolved public-access value
    error_message = "Key Vault must never allow public network access."
  }
}

Run it with:

terraform init
terraform test

The contract assertions worth writing — each proves a guardrail holds, at plan, for free:

Test (`run` block)	`command`	Asserts	Catches
`defaults_are_locked_down`	`plan`	KV `public_network_access_enabled == false`	A future edit re-exposing the vault
`storage_forces_entra`	`plan`	SA `shared_access_key_enabled == false`	Key auth creeping back in
`diagnostics_present`	`plan`	Each resource has a `diagnostic_settings` entry	Someone dropping the audit trail
`tags_merged`	`plan`	Output tags include `managedBy` + caller’s	Tag-merge logic regressions
`mandatory_tags_rejected`	`plan` (expect fail)	Missing `costCenter` fails validation	The validation block being removed
`bad_workload_rejected`	`plan` (expect fail)	`CHECKOUT!` fails the regex	Naming rule regressions
`pe_wired_to_subnet`	`plan`	KV PE references the `pe` subnet id	PE wiring breaking on refactor

The terraform test building blocks you actually use, so you can read and write .tftest.hcl fluently:

Construct	Goes in	Purpose	Note
`run "<name>" {}`	Test file	One test case (a plan or apply)	Runs in order; later runs see earlier state
`command = plan`	`run` block	Assert without deploying	The default for contract tests
`command = apply`	`run` block	Deploy then assert (real resources)	Needs creds + a real/ephemeral sub
`variables {}`	`run` block	Inputs for this case	Override per-run
`assert {}`	`run` block	`condition` + `error_message`	The check itself
`expect_failures`	`run` block	Assert a validation should fail	Proves guardrails reject bad input
`provider {}` / `providers`	File / `run`	Wire/alias providers for the test	Mock or real
`module {}` (override)	`run` block	Swap a child module for a stub	Isolate the unit under test

Terratest (Go) is for real end-to-end validation against an ephemeral subscription — apply, assert against live Azure, destroy. Use it in CI nightly, not on every push:

func TestSpokeLandingZone(t *testing.T) {
    opts := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
        TerraformDir: "../examples/default",
    })
    defer terraform.Destroy(t, opts)
    terraform.InitAndApply(t, opts)

    kvURI := terraform.Output(t, opts, "key_vault_uri")
    assert.Contains(t, kvURI, "vault.azure.net")
}

In CI, authenticate with OIDC workload identity federation (no stored secrets), and target a disposable subscription so a failed destroy never pollutes a real environment:

az login --service-principal -u "$ARM_CLIENT_ID" \
  --tenant "$ARM_TENANT_ID" --federated-token "$IDTOKEN"
export ARM_USE_OIDC=true
export ARM_SUBSCRIPTION_ID="$EPHEMERAL_SUB_ID"
cd test && go test -timeout 45m ./...

The Terratest assertions worth the spend — behaviour you cannot prove at plan:

Terratest assertion	Proves	Why plan-level can’t catch it
`key_vault_uri` contains `vault.azure.net`	The vault actually came up	Plan doesn’t materialise computed URIs reliably
Private DNS resolves the KV PE name	PE + DNS zone group wired correctly	DNS resolution is runtime behaviour
Storage data-plane rejects shared-key auth	Entra-only enforcement is real	Plan asserts intent, not Azure enforcement
Diagnostic setting visible in LAW	Logs flow to the workspace	Ingestion is runtime
`terraform destroy` leaves zero resources	Clean teardown (no orphans)	Only an apply/destroy cycle exposes orphans

The CI auth model, made explicit — OIDC keyless is the right default; the same pattern powers GitHub Actions + Terraform OIDC plan/PR automation:

Auth approach	Secret stored?	Blast radius	Verdict
OIDC workload identity federation	None	Short-lived, scoped token	Use this
Service principal + client secret	Yes (long-lived)	Leaked secret = standing access	Avoid; rotate if unavoidable
Managed identity (self-hosted runner)	None	Scoped to the runner identity	Good for self-hosted agents
Personal `az login` on a runner	Yes (interactive)	The human’s full access	Never in CI

Publishing to a private registry

Wrappers are useless if teams copy-paste them. Publish them and consume by source/version. Two common backends, and the trade-off between them:

Backend	Native registry?	Source string consumers use	Versioning mechanism	Best when
Terraform Cloud / Enterprise	Yes	`app.terraform.io/<org>/<name>/azurerm`	Git tags (valid semver)	You’re on TFC/TFE already
Azure DevOps (git ref)	No	`git::https://dev.azure.com/...?ref=v1.3.0`	Tag ref in the URL	Azure DevOps shop, no TFC
Private VCS git ref (generic)	No	`git::ssh://...//module?ref=v1.3.0`	Tag ref in the URL	Any git host, lowest setup
Storage/HTTP archive	No	`https://.../module-1.3.0.zip`	Versioned artifact name	Air-gapped / artifact-store shops

Terraform Cloud / Enterprise private registry. Modules must live in repos named terraform-<provider>-<name> and are published from git tags that are valid semver. Tag, push, and the registry ingests the version:

git tag v1.3.0
git push origin v1.3.0

Consumers then reference it through the registry hostname:

module "spoke" {
  source  = "app.terraform.io/contoso/spoke-landing-zone/azurerm"
  version = "~> 1.3.0"
  # ... only the narrow contract inputs
}

Azure DevOps. There is no native Terraform registry product, so the pragmatic pattern is consuming wrappers as versioned git sources (a tag ref) pointed at Azure Repos, fronted by a CI pipeline that runs validate/test on tag:

module "spoke" {
  source = "git::https://dev.azure.com/contoso/_git/platform-modules//spoke-landing-zone?ref=v1.3.0"
}

Consumption contract: semver is a promise. Bump patch for fixes, minor for additive inputs/outputs, major for anything that changes or removes an input or alters resource addresses. The moment you rename a wrapper variable, that’s a major — app teams pinned with ~> must opt in.

The semver decision table — what each kind of change costs in version terms:

Change you made to the wrapper	Bump	Why	Consumer impact (`~> X.Y.Z`)
Fix a bug, no interface change	patch	Behaviour-preserving	Auto-picked up
Add a new optional input/output	minor	Additive, backward-compatible	Auto-picked up
Tighten a `validation` (stricter)	major	May reject previously-valid input	Must opt in
Rename / remove an input	major	Breaks callers	Must opt in
Change a resource address (for_each, etc.)	major (+ `moved`)	State migration for consumers	Must opt in; needs `moved`
Bump an internal AVM dep (no surface change)	patch/minor	Depends on AVM’s own change	Usually transparent
Change a default value	major	Silent behaviour change	Must opt in

The publish pipeline gates that should run on a tag, in order:

Stage	Command	Gate
Format	`terraform fmt -check -recursive`	Block on diff
Validate	`terraform init -backend=false && terraform validate`	Block on error
Lint	`tflint` (+ ruleset)	Block on error
Contract tests	`terraform test`	Block on any failed `run`
Security scan	`checkov` / `tfsec` / `trivy`	Block on high severity (see scanning article)
Tag → publish	`git tag vX.Y.Z && git push --tags`	Registry ingests the version

Migration path: replacing hand-rolled modules without state churn

The objection that kills AVM adoption: “we have hundreds of resources in state; switching modules means destroy/recreate.” It does not — if you use moved blocks. When you swap your old module "storage" for the AVM wrapper, the resource address changes (e.g. module.storage.azurerm_storage_account.this becomes module.sa.azurerm_storage_account.this[0]). Tell Terraform it’s the same object:

moved {
  from = module.storage.azurerm_storage_account.this
  to   = module.sa.azurerm_storage_account.this[0]
}

moved blocks are declarative and version-controlled — they survive across the whole team, unlike a one-off terraform state mv. For resources that AVM creates as a child but you previously managed standalone (or that exist in Azure but not in state), use an import block instead:

import {
  to = module.sa.azurerm_storage_account.this[0]
  id = "/subscriptions/<sub>/resourceGroups/rg-checkout/providers/Microsoft.Storage/storageAccounts/stcheckouteus"
}

Migrate one module type at a time, behind a PR, and read the plan. A correct migration shows the resource moving with zero destroy/create lines — only in-place diffs for AVM’s added defaults (diagnostics, etc.). The mechanism-to-situation map — pick the right tool for what you’re migrating:

Situation	Tool	What it does	Plan should show
Same resource, address changed (your module → AVM)	`moved` block	Re-points state at the new address	Move, no destroy/create
Resource in Azure but not in Terraform state	`import` block	Brings the existing object under management	Import + in-place diffs
AVM moved a resource under `for_each` (internal)	`moved` block (indexed key)	Maps old address → keyed address	Move, no destroy/create
Resource truly being replaced (rename forces new)	(accept)	Genuine destroy/create	Acknowledge explicitly in the PR
One-off local fix, not for the team	`terraform state mv` (avoid)	Imperative, non-versioned	(use `moved` instead)

The migration playbook as a table — symptom in the plan, what it means, how to confirm, and the fix:

#	Plan symptom on adopting AVM	Root cause	Confirm with	Fix
1	Every storage account shows `destroy` + `create`	Resource address changed (your module → AVM)	`terraform plan` lists `-/+ ... this[0]`	`moved` block from old → new address
2	A resource shows `create` though it exists in Azure	It’s in Azure but not in state	Portal/CLI shows the resource live	`import` block with the resource id
3	Replace appears only after a minor AVM bump	AVM moved the resource under `for_each`	Diff the module’s `main.tf` across versions	`moved` to the keyed address, same wrapper bump
4	Diagnostic settings show as new (in-place add)	AVM injects diagnostics you didn’t have	Plan shows `+ azurerm_monitor_diagnostic_setting`	Expected — accept the additive diff
5	Plan errors: deployment denied	`enable_telemetry = true` in a locked sub	Error names `Microsoft.Resources/deployments`	Set `enable_telemetry = false`
6	RBAC assignment churns on reorder	Unkeyed `role_assignments` list reindexed	Plan shows delete+add of identical roles	Use the keyed `map` form AVM expects
7	Private endpoint shows replace	PE subnet id changed under the hood	Compare `subnet_resource_id` old vs new	`moved` the PE resource; align the subnet
8	Whole module shows replace after provider bump	Provider major changed a schema	`.terraform.lock.hcl` provider delta	Pin provider; migrate per the provider guide

The CI gate that makes an unacknowledged replace impossible to merge — read every plan for delete+create and fail the build:

terraform plan -no-color -out tfplan
terraform show -json tfplan \
  | jq -e '[.resource_changes[]
            | select(.change.actions == ["delete","create"]
                  or .change.actions == ["create","delete"])] | length == 0' \
  || { echo "::error::Unacknowledged replace in plan"; exit 1; }

Architecture at a glance

The diagram traces the module supply chain left to right — the path a resource definition travels from Microsoft’s public registry to a deployed, private, tagged spoke in your subscription. Read it as five zones. At the far left, upstream AVM ships the avm-res-* bricks (pre-1.0, the version trap lives here) and the avm-ptn-* assemblies. Those bricks flow by source + version into your platform wrapper tier — the heart of the system — where the spoke-landing-zone module composes a VNet, a Key Vault and a storage account and injects the org non-negotiables: public_network_access_enabled = false, private endpoints, forced diagnostics to the central LAW, Entra-only storage auth, and enable_telemetry = false. The wrapper’s narrow variables.tf is the membrane: app teams pass a workload name and tags, nothing dangerous.

From the wrapper, the path runs through the test + release gate — terraform test for plan-level contract assertions, Terratest for a real apply/destroy against an ephemeral subscription over keyless OIDC, and the replace gate that fails CI on any unacknowledged destroy/create. Only a green build tags a version into the private registry (Terraform Cloud or a git ref), from which the rightmost zone — 40+ app repos — consumes the wrapper by source/version with narrow inputs only, and terraform apply lands a spoke that is private and observable by construction. The five numbered badges mark the real hazards on this path: the ~> 0.x pin trap on the upstream brick, telemetry failing in a locked subscription, a passthrough wrapper that leaks a public lever, the state-address churn a minor bump can cause, and the brownfield-import gap when adopting AVM over existing Azure resources. Follow the numbers and you have both the architecture and the failure map in one view.

Real-world scenario

Northwind Cloud Platform is the four-engineer central team behind a retailer’s Azure estate: 40+ application repos, each owning one or more spokes inside a CAF-aligned landing zone, all on Terraform with state in Azure Storage and CI in Azure DevOps. Eighteen months ago every app team hand-rolled azurerm_ resources; a security review found nine publicly-accessible storage accounts and a Key Vault open to the internet, and “fix it” meant nine separate PRs against nine codebases. The platform team’s mandate after that review: make “private, tagged, observable” the only way to ship a spoke, and make estate-wide convention changes a one-PR operation. Their answer was the spoke-landing-zone wrapper over AVM resource bricks, distributed as a versioned git ref, pinned Azure/avm-res-storage-storageaccount/azurerm at an exact 0.6.x, consumed by every repo with ?ref=v1.x.

It worked beautifully for six months — until a routine Renovate PR bumped a single AVM minor in the wrapper, and the terraform plan that CI attached showed every storage account across 40 repos scheduled for destroy/create. The exact pin had not saved them, because the upgrade itself was the breaking event: that AVM release had moved the storage account resource under a for_each map, changing its address from ...this to ...this["default"]. A naive merge — and the team’s normal flow was “Renovate is green-ish, approve” — would have nuked 40 production data planes in a single apply. The engineer reviewing it noticed the plan was suspiciously long, scrolled, and saw the -/+ lines. That was the whole margin: one human reading a plan.

The breakthrough was reframing the problem. The issue was never “which version” — exact pins control when you take an upgrade, not whether it is safe. The issue was that an AVM bump in a shared wrapper is, structurally, a state migration, and they had been treating it as a dependency bump. The fix was to absorb the address change inside the wrapper with a moved block, shipped in the same version bump so all 40 consumers inherited it transparently:

moved {
  from = module.sa.azurerm_storage_account.this
  to   = module.sa.azurerm_storage_account.this["default"]
}

Then they made this class of failure impossible to miss rather than relying on a tired reviewer: a CI step that parses terraform show -json and fails the build if the plan contains any destroy/create not explicitly acknowledged in the PR description. They also moved the wrapper’s exact-pin bumps behind a dedicated review checklist (“is this AVM minor a possible address change? diff the module’s main.tf”) and added a nightly Terratest run against an ephemeral subscription so behaviour regressions surfaced before a tag, not after.

Six months on, the estate is in a different posture. A new compliance rule — force customer-managed keys on storage — landed as one wrapper PR with a reviewed plan, propagated to all 40 repos by a ~> v1.x bump, with the replace-gate guaranteeing zero data-plane loss. The lesson on their wall: “Every AVM bump in a shared wrapper is a state migration until a moved-aware plan proves otherwise.” The incident, as a timeline, because the order of moves is the lesson:

Time	Event	Action taken	Effect	What it should have been
Day 0	Renovate bumps an AVM minor	(PR opened, CI green-ish)	—	Treat every AVM bump as a possible migration
Day 0 +5 min	Plan attached to PR	Reviewer scrolls the long plan	Spots 40× `destroy`/`create`	The save — but luck, not process
Day 0 +20 min	Root cause found	Diff the module `main.tf` across versions	Resource moved under `for_each`	—
Day 0 +1 h	Fix drafted	Add `moved` block in the same wrapper bump	Plan now shows move, zero replace	Correct fix
Day 1	Shipped	Tag `v1.4.0`; consumers inherit transparently	40 repos migrate with no churn	—
Day 2	Hardened	Add `jq` replace-gate to CI	Unacknowledged replace can’t merge	The durable fix
+1 week	Institutionalised	Nightly Terratest + AVM-bump checklist	Regressions caught pre-tag	The process change

Advantages and disadvantages

The wrap-don’t-fork, AVM-as-substrate model both enables a real platform layer and carries sharp edges you must respect. Weigh it honestly:

Advantages (why this model helps)	Disadvantages (why it bites)
One shared spec means generic org policy (force diagnostics/PE everywhere) instead of bespoke per-resource wiring	The shared spec is still pre-1.0 — `~>` semantics are inverted and the trap is in the shared wrapper
WAF-aligned defaults: you inherit good defaults instead of the bare minimum that compiles	“Good defaults” still aren’t your defaults — you must inject org policy, or it’s just a fancier resource
Estate-wide convention changes become one pinned-version wrapper PR	A bad wrapper bump has a 40-repo blast radius; discipline is non-optional
Migration is non-destructive with `moved`/`import` — adopt over existing state, zero recreate	A wrong `moved` target silently destroys and recreates — you must read the plan
Guardrails as types (`validation`, omitted levers) catch violations at `plan`, not in a quarterly review	Narrowing the surface is work; the lazy path (passthrough) gives none of the value
Upstream fixes flow to you for free because you compose, don’t fork	You’re coupled to AVM’s release cadence and its internal address choices
Keyed maps (`role_assignments`, `diagnostic_settings`) survive reordering — no index churn	Telemetry’s empty deployment fails plans in locked subscriptions until you set it `false`
Two-tier testing (`terraform test` + Terratest) proves both shape and behaviour	Terratest costs real Azure spend and time; you must run it judiciously

The model is right when you operate Terraform at scale (many repos, many spokes) and need conventions enforced by construction rather than by review. It is overkill for a single-team, handful-of-resources estate where a wrapper is more ceremony than value — there, consume AVM resource modules directly. It bites hardest on teams that pin pre-1.0 modules with ~>, that build passthrough “wrappers” with no injected policy, and that merge AVM bumps without reading the plan. Every one of those is a manageable failure — but only if you know it exists, which is the point of the deep sections above.

Hands-on lab

Build a minimal spoke-landing-zone wrapper over an AVM brick, prove its guardrail holds at plan with terraform test, and prove the ~> 0.x trap is real — all without deploying a thing (free). Run in any shell with Terraform ≥ 1.7 (for terraform test) and the azurerm provider available. No Azure spend: every step is plan-level.

Step 1 — Scaffold the wrapper.

mkdir -p spoke-landing-zone/tests && cd spoke-landing-zone
cat > versions.tf <<'EOF'
terraform {
  required_version = ">= 1.7.0"
  required_providers {
    azurerm = { source = "hashicorp/azurerm", version = "~> 4.0" }
  }
}
provider "azurerm" {
  features {}
  # plan-only; no real auth needed if you don't apply
  skip_provider_registration = true
}
EOF

Step 2 — A narrow contract with a validated input. This is the membrane app teams see.

cat > variables.tf <<'EOF'
variable "workload" {
  type = string
  validation {
    condition     = can(regex("^[a-z0-9]{2,12}$", var.workload))
    error_message = "workload must be 2-12 lowercase alphanumeric chars."
  }
}
variable "location"            { type = string }
variable "resource_group_name" { type = string }
variable "tags" {
  type = map(string)
  validation {
    condition     = contains(keys(var.tags), "costCenter") && contains(keys(var.tags), "owner")
    error_message = "tags must include costCenter and owner."
  }
}
EOF

Step 3 — Compose one AVM brick with injected, non-negotiable policy. A storage account: public off, Entra-only, telemetry off — none of it exposed to the caller.

cat > main.tf <<'EOF'
locals { base_tags = merge(var.tags, { managedBy = "platform-team" }) }

module "sa" {
  source  = "Azure/avm-res-storage-storageaccount/azurerm"
  version = "0.6.4"   # exact pin — the wrapper absorbs upgrade risk deliberately

  name                = "st${var.workload}eus"
  resource_group_name = var.resource_group_name
  location            = var.location
  tags                = local.base_tags

  enable_telemetry              = false   # injected, not exposed
  public_network_access_enabled = false   # injected — the lever app teams DON'T get
  shared_access_key_enabled     = false   # force Entra auth
}
EOF
terraform init

Expected: terraform init downloads Azure/avm-res-storage-storageaccount/azurerm at 0.6.4 and the azurerm provider; “Terraform has been successfully initialized!”.

Step 4 — Prove the guardrail at plan with a contract test.

cat > tests/defaults.tftest.hcl <<'EOF'
run "storage_is_locked_down" {
  command = plan
  variables {
    workload            = "checkout"
    location            = "eastus"
    resource_group_name = "rg-checkout"
    tags                = { costCenter = "1234", owner = "team@contoso.com" }
  }
  assert {
    condition     = module.sa.... == false   # resolve the public-access output your AVM version exposes
    error_message = "Storage must never allow public network access."
  }
}
EOF
terraform test

Expected: terraform test runs the run block at plan level and reports the assertion result — no resources created. (Adjust the module.sa.... reference to the actual output your pinned AVM version surfaces; terraform output/the module’s outputs.tf tells you the name.)

Step 5 — Prove the validation rejects bad input. Feed an illegal workload name and watch it fail at plan, not in production.

terraform plan -var 'workload=CHECKOUT!' \
  -var 'location=eastus' -var 'resource_group_name=rg-x' \
  -var 'tags={costCenter="1",owner="a@b.com"}'
# Expected: Error — "workload must be 2-12 lowercase alphanumeric chars."

Step 6 — Prove the ~> 0.x trap is real. Loosen the pin and watch init -upgrade reach for a higher minor than you intended.

# Temporarily change the version line to the DANGEROUS form and re-init:
#   version = "~> 0.6"     # allows 0.7.0, 0.8.0, ... a BREAKING minor
sed -i.bak 's/version = "0.6.4"/version = "~> 0.6"/' main.tf
terraform init -upgrade
# Read which version actually resolved:
grep -A2 'avm-res-storage' .terraform/modules/modules.json 2>/dev/null || \
  terraform version
# Restore the safe exact pin:
mv main.tf.bak main.tf && terraform init -upgrade

The point: ~> 0.6 silently admits a breaking 0.7.x/0.8.x; only 0.6.4 or ~> 0.6.4 holds the line.

Validation checklist. You built a wrapper that injects three security non-negotiables the caller cannot override, proved the guardrail at plan with terraform test (no deploy), proved the validation block rejects bad naming, and demonstrated the pre-1.0 ~> trap first-hand. The lab steps mapped to what each proves:

Step	What you did	What it proves	Real-world analogue
3	Inject `public_network_access_enabled=false`	The lever app teams don’t get	Every guarded brick in the wrapper
4	`terraform test` the locked-down default	Guardrails verified at `plan`, for free	The CI contract suite
5	Feed an illegal `workload`	Conventions are hard failures, not wiki text	Naming policy enforced as a type
6	Loosen to `~> 0.6`, re-init	The pre-1.0 `~>` trap is real	The Renovate-bump near-miss

Cleanup. No Azure resources were created (everything was plan-level), so just remove the directory.

cd .. && rm -rf spoke-landing-zone

Cost note. Zero — every command is init/plan/test, which create nothing in Azure. (A real Terratest run would cost a few rupees of ephemeral-subscription spend for the minutes resources exist; this lab deliberately avoids apply.)

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First as a scannable table, then the entries that bite hardest with the full reasoning underneath.

#	Symptom	Root cause	Confirm (exact cmd / check)	Fix
1	A breaking module version got pulled despite `~>`	Pre-1.0: `~> 0.9` admits `0.10.0`	Read resolved version in `.terraform/modules/modules.json` / lock	Pin `~> 0.9.1` (three-part) or exact in wrappers
2	`terraform plan` errors on a deployment resource	`enable_telemetry=true` in a policy-locked sub	Plan error names `Microsoft.Resources/deployments`	Set `enable_telemetry=false` in every wrapper
3	App team shipped a public Key Vault / storage	Wrapper exposes the lever (passthrough)	grep wrapper for `public_network_access_enabled`	Remove the input; hard-code `false` in `main.tf`
4	Plan shows 40× `destroy`/`create` after a bump	AVM moved the resource under `for_each`	`terraform show -json` → actions `["delete","create"]`	`moved` block to the keyed address, same wrapper bump
5	A resource shows `create` but exists in Azure	In Azure, not in Terraform state	Portal/CLI shows the live resource	`import` block with the resource id
6	RBAC assignments churn every plan	Unkeyed `role_assignments` list reindexed	Plan shows delete+add of identical roles	Use the keyed `map` form AVM expects
7	`version` constraint won’t resolve	Constraint impossible (e.g. `>= 0.9, < 0.9`)	`terraform init` “no available releases” error	Fix the range; check the registry for real versions
8	Consumer can’t find the published module	Repo not named `terraform-<provider>-<name>` or no semver tag	Registry shows no versions	Rename repo; push a valid `vX.Y.Z` tag
9	`terraform test` passes but apply fails in Azure	Plan-level test can’t catch runtime behaviour	Terratest apply surfaces the real error	Add a Terratest assertion for that behaviour
10	Wrapper `variables.tf` is huge	It mirrors the AVM surface (no narrowing)	Count inputs vs the AVM module’s	Narrow deliberately; inject the rest
11	OIDC login fails in CI	Federated credential / subject mismatch	`az login --federated-token` error	Fix the federated credential subject/audience
12	Provider major bump replaces everything	`azurerm` v3→v4 schema change	`.terraform.lock.hcl` provider delta	Pin provider; follow the provider upgrade guide
13	`moved` block did nothing (still replaces)	`from`/`to` address wrong	Plan still shows `-/+`	Correct the exact source/target address strings
14	Renovate raises no AVM PRs	`packageRules` pattern doesn’t match	Renovate logs / dry-run	Fix `matchPackageNames` to `/^Azure/avm-/`

The expanded form, for the entries that cause the most damage:

1. A breaking module version got pulled despite a ~> constraint. Root cause: The module is pre-1.0 and ~> 0.9 expands to >= 0.9.0, < 1.0.0, so it admits a breaking 0.10.0 because AVM treats the minor segment as breaking below 1.0. Confirm: Read the resolved version in .terraform/modules/modules.json (or the registry-backed lock) and compare to what you intended. Fix: In wrappers, pin exact (version = "0.9.1"); if you must allow drift, use the three-part ~> 0.9.1 (allows 0.9.x, blocks 0.10.0). Never ~> 0.9 on a 0.x module.

2. terraform plan fails on a deployment resource in a locked subscription. Root cause: enable_telemetry = true (the AVM default) deploys a tiny empty Microsoft.Resources/deployments; in subscriptions where that operation is policy-denied, the plan fails with a confusing error that doesn’t obviously point at telemetry. Confirm: The plan/apply error names a Microsoft.Resources/deployments operation being denied by policy. Fix: Bake enable_telemetry = false into every wrapper, decided once org-wide — not per call.

3. An app team shipped a publicly-exposed Key Vault or storage account. Root cause: The wrapper exposed public_network_access_enabled (a passthrough), so the team could set it true — the wrapper added no guardrail. Confirm: grep -r public_network_access_enabled in the wrapper finds it in variables.tf (exposed) rather than only hard-coded in main.tf. Fix: Remove the input from variables.tf; hard-code public_network_access_enabled = false in main.tf. Guardrails are the levers you omit. Back it with an Azure Policy deny for defence in depth.

4. After a minor AVM bump, the plan shows every instance of a resource scheduled for destroy/create. Root cause: The AVM release changed the resource’s address (typically moving it under a for_each map), so Terraform sees the old address gone and a new one created — a destroy/create, which on data resources is destruction. Confirm: terraform show -json tfplan | jq '.resource_changes[].change.actions' shows ["delete","create"]. Fix: Add a moved block from the old address to the new keyed address, shipped in the same wrapper version so consumers inherit it transparently; gate CI to reject unacknowledged replaces.

5. A resource shows create in the plan even though it already exists in Azure. Root cause: The resource exists in Azure but is not in Terraform state (created out-of-band, or being adopted into AVM as a child it didn’t manage before). Confirm: The portal/CLI shows the resource live; terraform state list doesn’t include it. Fix: Use an import block (to = the AVM resource address, id = the Azure resource id) so Terraform adopts it instead of creating a duplicate; read the plan for in-place-only diffs.

6. RBAC role assignments churn (delete + re-add identical roles) on every plan. Root cause: role_assignments passed as an unkeyed list gets reindexed when the order changes, so Terraform sees deletes and adds of the same assignments. Confirm: The plan shows azurerm_role_assignment deletes and creates with identical role/scope. Fix: Pass role_assignments as the keyed map AVM expects, so add/remove never reindexes the survivors.

10. The wrapper’s variables.tf is nearly a copy of the AVM module’s inputs. Root cause: The “wrapper” is a passthrough — it forwards the full AVM surface, so it provides no narrowing and no guardrails (the entire reason it exists). Confirm: The input count roughly matches the AVM module’s, and security levers (public_network_access_enabled, shared_access_key_enabled) appear in variables.tf. Fix: Narrow deliberately to the small contract app teams need; inject the rest as constants. A platform layer is defined by what it refuses to expose.

Best practices

Wrap, never fork. Compose AVM resource bricks and inject org policy; forking means you own maintenance forever and lose upstream fixes. The wrapper is your code; the bricks stay upstream.
Pin AVM dependencies exactly in wrappers. version = "0.9.1", not ~> 0.9 — pre-1.0 minors are breaking. Take upgrades deliberately, in a PR, with a reviewed plan. Let app repos pin your wrapper with ~> X.Y.Z.
Treat every AVM bump as a possible state migration. Before merging a bump in a shared wrapper, diff the module’s main.tf across versions for address changes and assume a moved block may be required.
Gate CI on unacknowledged replaces. A jq check over terraform show -json that fails on any destroy/create not acknowledged in the PR turns a 40-data-plane disaster into a build failure.
Define guardrails by omission. The levers you don’t expose (public_network_access_enabled, shared_access_key_enabled, enable_telemetry) are the guardrails. Inject them as constants in main.tf.
Validate conventions as types. validation blocks on naming and mandatory tags turn “please remember” into a hard plan-time failure that no one can skip.
Set enable_telemetry = false org-wide. Decide once and bake into wrappers; it prevents the empty-deployment plan failure in locked subscriptions.
Test at two altitudes. terraform test for fast plan-level contract assertions on every push; Terratest against an ephemeral subscription nightly for real behaviour (PE/DNS resolution, Entra-only enforcement).
Authenticate CI with OIDC. Workload identity federation — no stored secrets, short-lived scoped tokens — for both plan automation and Terratest.
Publish; never copy-paste. Distribute wrappers by source/version from a private registry or semver git tag, and honour semver: renaming an input or changing an address is a major.
Use moved/import, not state mv. Declarative, version-controlled migration survives across the team; imperative state surgery doesn’t.
Commit .terraform.lock.hcl. It’s the only thing that makes provider versions reproducible across the team and CI.
Layer Azure Policy under the wrapper. The wrapper prevents violations at author time; Policy catches anything provisioned outside it. Belt and braces — see Azure Policy as code.

The practices as a pre-flight checklist for any new wrapper:

Check	Pass criterion	Why it matters
AVM deps pinned exactly	No `~> 0.x` anywhere in the wrapper	Avoids the breaking-minor trap
`enable_telemetry` injected `false`	Not exposed; constant in `main.tf`	No plan failure in locked subs
Security levers omitted	`public_network_access_enabled` etc. not in `variables.tf`	Guardrail by construction
Diagnostics + PE injected	Forced to central LAW / PE subnet	Observability + isolation non-optional
Naming + tags validated	`validation` blocks present	Conventions are hard failures
Contract tests exist	`terraform test` covers the guardrails	Regressions caught at `plan`
Replace gate in CI	`jq` check fails on `destroy`/`create`	No accidental data-plane loss
Published by semver	Tag + registry/git ref, not copy-paste	One bump propagates everywhere
Lock file committed	`.terraform.lock.hcl` in VCS	Reproducible provider versions

Security notes

Guardrail by omission is a security control. The single highest-leverage security decision here is not exposing public_network_access_enabled and shared_access_key_enabled. An app team cannot ship a public vault or key-auth storage account if the lever doesn’t exist in the contract — this beats any after-the-fact scan.
Force Entra-only auth and private endpoints. Inject shared_access_key_enabled = false (kills stealable account keys) and private_endpoints = {…} (keeps secrets/data traffic on the backbone) as non-negotiables. Pair with the private DNS zones the PE needs.
Keyless CI with OIDC. Use workload identity federation for plan automation and Terratest; never store a long-lived service-principal secret in a pipeline. Scope the federated identity to the specific repo/branch/environment.
Least-privilege test subscription. Terratest runs apply with real credentials — scope that identity to a disposable subscription so a failed destroy or a compromised runner can’t touch production.
Inject a baseline role_assignments, don’t expose raw RBAC. Let the wrapper grant the minimal assignments the pattern needs (keyed map); review any extension. Don’t let app teams hand-write arbitrary role grants inline.
Diagnostics to a central, access-controlled LAW. Forcing diagnostic_settings to your central Log Analytics workspace means every spoke is auditable; lock down who can read that workspace.
Pin and scan the supply chain. Exact AVM pins plus a committed lock file mean you know exactly what code runs; run checkov/tfsec/trivy in the publish pipeline so a wrapper can’t ship a misconfiguration. See Checkov, Trivy & tfsec IaC scanning.
Defence in depth with Azure Policy. The wrapper enforces at author time; Azure Policy deny/deployIfNotExists enforces at the platform — anything provisioned outside the wrapper still gets caught.

The security controls mapped to the threat each removes and the policy backstop:

Control (in the wrapper)	Threat removed	Azure Policy backstop
Omit `public_network_access_enabled`	Internet-exposed KV/SA	Deny public network access
`shared_access_key_enabled = false`	Long-lived account-key theft	Deny storage key access
Injected `private_endpoints`	Data/secrets off the backbone	Audit/deny resources without PE
Injected `diagnostic_settings` → LAW	Unaudited resource	DeployIfNotExists diagnostics
OIDC keyless CI	Stolen long-lived pipeline secret	(conditional access on the identity)
Disposable test subscription	Blast radius of a CI compromise	Management-group scoping
Baseline keyed `role_assignments`	Over-broad inline RBAC	Deny role assignments at wrong scope
Supply-chain scan in publish CI	Shipping a misconfigured wrapper	(gate is the policy here)

Cost & sizing

There is no per-hour charge for “an AVM module” — the cost story here is operational spend plus the resources your wrappers deploy, and the way the platform layer saves money is by making convention changes one PR instead of forty. The drivers, what each costs, and how the platform layer moves the number:

Cost driver	What you pay for	Rough INR / month	How the platform layer affects it
Terratest on an ephemeral subscription	Minutes of real resources during apply→destroy	~₹500–2,000 (nightly, small spokes)	Keep spokes minimal; destroy reliably; run nightly not per-push
CI compute (plan/test/publish)	Pipeline minutes	Often free tier / ~₹0–1,000	Plan-level `terraform test` is cheap; gate Terratest to nightly
Terraform Cloud / Enterprise	Per-user or per-run, if used	Varies (free tier exists)	Optional — git-ref distribution is ₹0
Private registry storage	Negligible (git tags)	~₹0	Tags cost nothing; storage/HTTP archive is tiny
The deployed spoke itself	VNet (free), KV (per-op), SA, PE	Per the resources (PE ~₹600–900/PE/mo)	Wrapper standardises sizing; PEs add a per-endpoint hourly charge
Renovate (self-hosted or app)	Compute / free GitHub app	~₹0	Saves engineer-hours chasing bumps
Engineer time (the real cost)	Hours per estate-wide change	(the big one)	One wrapper PR vs 40 hand-edits — the whole ROI

Right-sizing guidance: the only recurring infra cost the platform layer adds is the ephemeral-subscription Terratest spend — keep example spokes minimal (one VNet, one KV, one SA, the PEs under test) and ensure destroy is reliable (the replace-gate and a defer terraform.Destroy prevent orphans that quietly bill). Private endpoints are the one line item to watch in the deployed spoke: each PE carries a small hourly charge plus per-GB processing, so don’t inject PEs on services that don’t need them. Everything else — the registry (git tags), CI plan/test, Renovate — is effectively free. The justification is engineer-time: a single compliance change (force CMK, deny public) that used to be 40 PRs becomes one reviewed wrapper bump, and the replace-gate makes that bump safe — which is worth far more than the few hundred rupees of nightly test spend.

A rough monthly picture for a 40-repo estate: nightly Terratest (~₹1,000–2,000), CI (free tier to ~₹1,000), registry/Renovate (~₹0), plus whatever the spokes themselves cost (dominated by PEs and the storage/KV operations, not the platform layer). The platform layer’s line on the bill is small; its line on the risk and engineer-hours ledger is where it pays for itself.

Interview & exam questions

1. What is the difference between an AVM resource module and a pattern module, and where does your platform wrapper fit? A resource module (avm-res-*) provisions one logical resource plus its directly-dependent children; a pattern module (avm-ptn-*) provisions a whole multi-resource architecture. Resource modules are LEGO bricks, pattern modules are pre-built assemblies. Your wrapper is a third tier — your own pattern module composed from AVM resource bricks that injects your org’s non-negotiables. You wrap, not fork.

2. Why is version = "~> 0.9" dangerous for an AVM module, and what should you use instead? AVM modules are pre-1.0, and AVM treats the minor segment as breaking below 1.0. ~> 0.9 expands to >= 0.9.0, < 1.0.0, so it admits a breaking 0.10.0. In wrappers, pin exact (0.9.1); if you need patch drift, use the three-part ~> 0.9.1 (allows 0.9.x, blocks 0.10.0).

3. What is enable_telemetry and why does it sometimes break a plan? AVM modules deploy a tiny, empty Microsoft.Resources/deployments whose name encodes module + version, used to measure usage — it sends no resource data. In subscriptions where that deployment operation is policy-denied, the plan fails with a confusing error. Set enable_telemetry = false once, org-wide, in your wrappers.

4. What makes a wrapper a real platform layer rather than a passthrough? What it does not expose. A passthrough forwards the full AVM surface, so an app team can still ship a public Key Vault. A platform layer exposes a narrow, validated contract (workload, tags, central LAW id) and injects the rest (public_network_access_enabled = false, forced PE/diagnostics, telemetry off) as constants the caller cannot override — guardrails as types, validated at plan.

5. How do you migrate a hand-rolled module to AVM without destroying resources? Use a moved block to re-point state from the old resource address to the new AVM address (the address changes, the object doesn’t), and an import block for resources that exist in Azure but not in state. Migrate one module type per PR and read the plan — a correct migration shows moves and in-place diffs with zero destroy/create.

6. A minor AVM bump in a shared wrapper shows every storage account scheduled for destroy/create. What happened and how do you fix it? The AVM release changed the resource’s address (moved it under a for_each map), so Terraform sees the old address removed and a new one created. Absorb the change with a moved block to the new keyed address, shipped in the same wrapper version so consumers inherit it transparently, and gate CI to reject unacknowledged replaces.

7. Why pin AVM exactly in wrappers but ~> X.Y.Z in app repos? The wrapper is where you absorb upgrade risk deliberately, in a reviewed PR with a plan diff — so exact pins. Your wrappers are semver-disciplined, so app repos can safely use ~> X.Y.Z on your wrapper and inherit the AVM versions you chose, getting your patches/minors automatically without ever pinning AVM directly.

8. When do you use terraform test versus Terratest? terraform test runs plan-level (or apply) assertions in-process — fast, free, no deploy — perfect for contract/shape checks (“does the wrapper produce the locked-down shape?”) on every push. Terratest runs a real apply/assert/destroy against an ephemeral subscription — slow and costs spend — for behaviour you can’t see at plan (PE/DNS resolution, Entra-only enforcement); run it nightly.

9. How do you authenticate Terraform CI to Azure without storing secrets? OIDC workload identity federation: the CI system presents a short-lived federated token (az login --federated-token, ARM_USE_OIDC=true) scoped to a specific repo/branch/environment, so there’s no long-lived service-principal secret to leak. Target a disposable subscription for any apply.

10. What semver bump does renaming a wrapper input require, and why? A major — renaming or removing an input is a breaking change to the contract; app teams pinned with ~> X.Y.Z won’t pick it up until they opt in. The same applies to changing a resource address (also needs a moved), tightening a validation, or changing a default value.

11. How does Renovate fit the AVM upgrade workflow? Renovate understands Terraform registry sources natively. A packageRule matching /^Azure/avm-/ groups AVM bumps into one PR on a schedule; CI attaches the terraform plan so each upgrade is a single reviewable unit — you review upgrades instead of chasing them, and the replace-gate guards the merge.

12. Why layer Azure Policy under the wrapper if the wrapper already enforces guardrails? Defence in depth. The wrapper enforces at author time for anything provisioned through it — but resources created out-of-band (portal, another tool, a non-wrapper module) bypass it. Azure Policy deny/deployIfNotExists enforces at the platform regardless of how a resource was created, catching what the wrapper can’t see.

These map to the HashiCorp Terraform Associate (modules, version constraints, state, moved/import) and Azure platform/DevOps exams: AZ-400 (IaC, release pipelines, secure CI) and AZ-104/AZ-305 (governance, landing zones, Policy). A compact cert mapping for revision:

Question theme	Primary cert	Objective area
Module classes, composition, wrapping	Terraform Associate	Use and create modules
Version constraints (`~>`, pre-1.0)	Terraform Associate	Module versioning & sources
`moved` / `import` migration	Terraform Associate	State & refactoring
`terraform test` / Terratest	Terraform Associate / AZ-400	Testing IaC; CI
OIDC keyless CI, replace-gate	AZ-400	Secure pipelines; release gates
Guardrails, Policy backstop, landing zones	AZ-305 / AZ-104	Governance & design

Quick check

You pin an AVM module with version = "~> 0.9". A teammate’s terraform init -upgrade pulls 0.10.0 and the plan goes haywire. Why, and what should the constraint have been?
A terraform plan fails in a locked-down subscription with an error about a Microsoft.Resources/deployments being denied. What AVM setting is the likely cause and what’s the fix?
True or false: the more inputs your wrapper’s variables.tf exposes, the more useful it is to app teams.
After a Renovate AVM bump in a shared wrapper, the plan shows 40 storage accounts as destroy/create. Name the root cause and the two-part fix.
You’re adopting AVM over a storage account that already exists in Azure but isn’t in Terraform state. Which block do you use, and what should a correct plan show?

Answers

The module is pre-1.0, and ~> 0.9 expands to >= 0.9.0, < 1.0.0; because AVM treats the minor segment as breaking below 1.0, that admits a breaking 0.10.0. The constraint should have been exact (0.9.1) in a wrapper, or the three-part ~> 0.9.1 (allows 0.9.x, blocks 0.10.0).
enable_telemetry = true (the AVM default) — it deploys a tiny empty Microsoft.Resources/deployments, which a policy-locked subscription denies, failing the plan. Fix: set enable_telemetry = false in the wrapper, once, org-wide.
False. A wrapper’s value is what it doesn’t expose. Exposing the full AVM surface makes it a passthrough with no guardrails — an app team could ship a public Key Vault. Expose a narrow, validated contract and inject the rest.
Root cause: the AVM bump changed the resource’s address (moved it under a for_each map), so Terraform plans destroy+create. Fix: (a) add a moved block to the new keyed address in the same wrapper version, and (b) gate CI to reject any unacknowledged destroy/create in the plan.
Use an import block (to = the AVM resource address, id = the Azure resource id). A correct plan shows the resource imported with only in-place diffs (e.g. AVM’s added diagnostics) and zero destroy/create.

Glossary

Azure Verified Modules (AVM) — Microsoft’s owned, specification-driven set of Terraform/Bicep modules with consistent interfaces and Well-Architected defaults, replacing inconsistent community modules.
Resource module (avm-res-*) — an AVM module provisioning one logical resource plus its directly-dependent child resources; the “brick” you compose.
Pattern module (avm-ptn-*) — an AVM module provisioning a whole multi-resource architecture (e.g. hub-spoke, a landing zone); a “pre-built assembly”.
Platform wrapper — your own pattern module composed from AVM resource bricks that injects org non-negotiables and exposes a narrow contract; a third tier over AVM. You wrap, not fork.
AVM interface — the shared set of optional inputs AVM mandates (tags, lock, role_assignments, diagnostic_settings, private_endpoints, managed_identities, enable_telemetry) that lets you write generic policy.
enable_telemetry — an AVM input (default true) that deploys a tiny empty ARM deployment for usage metrics; sends no resource data, but fails plans where Microsoft.Resources/deployments is policy-denied.
Pre-1.0 (0.x) versioning — AVM resource modules are below 1.0 and treat the minor segment as breaking, which inverts the usual ~> intuition.
~> X.Y.Z (pessimistic constraint) — allows the rightmost segment to increase; ~> 0.9.1 means >= 0.9.1, < 0.10.0, whereas ~> 0.9 means >= 0.9.0, < 1.0.0 (dangerous for 0.x).
validation block — a custom input precondition in variables.tf (with a condition and error_message) that turns conventions into hard plan-time failures.
terraform test — native HCL test runner (*.tftest.hcl) that asserts at plan (or apply) level; fast, free, no deploy — used for contract/shape assertions.
Terratest — a Go testing library that runs a real apply → assert → destroy against live Azure; slow and costs spend — used for end-to-end behaviour, typically nightly.
OIDC workload identity federation — keyless CI auth where the pipeline presents a short-lived federated token scoped to a repo/branch/environment, with no stored long-lived secret.
moved block — a declarative, version-controlled statement that a resource’s address changed from from to to, so Terraform re-points state instead of destroying and recreating.
import block — a declarative statement (to + id) that brings an existing Azure resource under Terraform management without recreating it.
Replace gate — a CI check (e.g. jq over terraform show -json) that fails the build on any unacknowledged destroy/create in the plan.
.terraform.lock.hcl — the dependency lock file pinning provider versions and checksums; commit it for reproducibility across the team and CI.
Private module registry — a registry (Terraform Cloud/Enterprise) or versioned git ref from which consumers pull wrappers by source/version instead of copy-pasting.

Next steps

You can now build, guard, test, distribute and safely upgrade an AVM-based Terraform platform layer. Build outward:

Next: Terraform module design: composition, versioning — the composition theory that underpins clean wrappers and a healthy module graph.
Related: Terraform testing: native & Terratest — go deeper on the two-altitude test strategy this article applies to wrappers.
Related: Terraform refactoring: moved, import & removed blocks — master the migration mechanics that keep every AVM bump a zero-destroy event.
Related: Azure Cloud Adoption Framework landing zones — the guardrail and management-group scaffolding your spokes deploy into.
Related: Checkov, Trivy & tfsec IaC scanning — wire a security scan into the publish pipeline so a wrapper can’t ship a misconfiguration.
Related: GitHub Actions + Terraform OIDC plan/PR automation — the keyless CI pattern that runs plan-on-PR and Terratest without stored secrets.
Related: Bicep private module registry with ACR & CI/CD — the equivalent platform-layer distribution story for teams that prefer Bicep AVM.

Building a Platform Layer with Azure Verified Modules and Terraform

What problem this solves

Learning objectives

Prerequisites & where this fits

Core concepts

The vocabulary in one table

Why AVM exists: the resource vs. pattern split

Reading an AVM module’s interface

Pinning and dependency strategy

Wrapping resource modules into pattern modules

Enforcing org defaults as non-negotiable inputs

Testing modules: `terraform test` and Terratest

Publishing to a private registry

Migration path: replacing hand-rolled modules without state churn

Architecture at a glance

Real-world scenario

Advantages and disadvantages

Hands-on lab

Common mistakes & troubleshooting

Best practices

Security notes

Cost & sizing

Interview & exam questions

Quick check

Answers

Glossary

Next steps

Written by Vinod

Comments

Building a Platform Layer with Azure Verified Modules and Terraform

What problem this solves

Learning objectives

Prerequisites & where this fits

Core concepts

The vocabulary in one table

Why AVM exists: the resource vs. pattern split

Reading an AVM module’s interface

Pinning and dependency strategy

Wrapping resource modules into pattern modules

Enforcing org defaults as non-negotiable inputs

Testing modules: terraform test and Terratest

Publishing to a private registry

Migration path: replacing hand-rolled modules without state churn

Architecture at a glance

Real-world scenario

Advantages and disadvantages

Hands-on lab

Common mistakes & troubleshooting

Best practices

Security notes

Cost & sizing

Interview & exam questions

Quick check

Answers

Glossary

Next steps

Written by Vinod

Comments

Testing modules: `terraform test` and Terratest