Private Endpoints and Private DNS at Scale: A Hub-and-Spoke Resolution Architecture

One private endpoint is easy. Three hundred of them across forty spokes, with on-prem clients that also need to resolve them, is an architecture problem. A private endpoint projects a private NIC for a PaaS resource — a storage account, a Key Vault, an Azure SQL server — into your virtual network and gives it a private IP. The catch is never the IP; it is the name. Your application still connects to the public FQDN baked into its SDK, connection string and TLS certificate, and unless your DNS quietly rewrites that name to the private IP, the traffic leaves over the internet path (or is rejected by the PaaS firewall) and the private endpoint you paid for is never used. Get the DNS design wrong early and you inherit zone sprawl, split-brain resolution, and a steady drip of “the app can’t reach the storage account” tickets that are never an application bug.

This is how to centralize it correctly the first time. We treat private-endpoint name resolution as one mechanism — public FQDN → privatelink.* CNAME → a private A record you host → the endpoint’s private IP — and then we make that mechanism work for hundreds of endpoints, dozens of spokes, and two client populations (in-Azure and on-prem) without copying a single zone more than once. You will host one copy of each privatelink.* zone in the connectivity subscription, project it into every spoke with a VNet link, let Azure Policy auto-bind every new endpoint to it, deploy the Azure DNS Private Resolver so on-prem can forward into Azure, and wire conditional forwarders in both directions. Every configuration carries an az command and a Bicep or Terraform snippet, and because this is a reference you will keep open during a rollout or an incident, the zone names, the failure modes, the limits and the resolution playbook are all laid out as scannable tables.

By the end you will stop guessing at 02:14 when a spoke suddenly resolves a public IP. You will know whether a VNet lost its link, a zone group was never created, a DNS-proxy NVA was rebuilt without its zone links, an on-prem forwarder points at the wrong place, or a leftover local zone is causing split-brain — and you will have the exact nslookup and az ... list to confirm which within ninety seconds. Read the prose once; keep the tables open the rest of the time.

What problem this solves

Private endpoints exist to keep PaaS traffic off the public internet — for compliance (“no public exposure of customer data”), for egress control (everything traverses the hub firewall), and to eliminate the data-exfiltration surface of a public storage endpoint. The networking is the easy 20%. The DNS is the 80% that silently fails, because resolution failures don’t throw — they succeed, returning the wrong (public) answer, and the application connects to the internet endpoint it was always going to connect to. Nothing errors until the PaaS firewall denies the public IP, or until an auditor notices traffic on the public path, or until a regional outage takes the public endpoint down while everyone assumed they were private.

What breaks without a deliberate design: forty spokes each grow their own copy of privatelink.blob.core.windows.net, records drift independently, and a redeploy in spoke 12 silently bypasses its zone group so that one account resolves public while thirty-nine resolve private. On-prem clients — which can never reach Azure’s internal resolver at 168.63.129.16 — get the public IP for everything and nobody notices until a partner integration fails. A DNS-proxy firewall in the hub becomes an undocumented single point of failure for all private resolution, and the day someone rebuilds it from a clean template, every spoke resolves public at the same instant.

Who hits this: every regulated or security-conscious team running PaaS behind private endpoints at landing-zone scale — banks, insurers, healthcare, government. It bites hardest where there are many spokes (record drift, missing links), hybrid connectivity (on-prem can’t see Azure DNS), a DNS-proxy NVA (single point of failure), and services with non-obvious zone names (Key Vault’s vaultcore, AKS’s regional zone, Azure Monitor’s set of five zones). The fix is never “add another zone per spoke” — it is “host one copy centrally, link it everywhere, and let policy enforce it.”

To frame the whole field before the deep dive, here is every failure class this article covers, the question it forces, and the first place to look:

Failure class	What actually happens	First question to ask	First place to look	Most common single cause
Spoke resolves public IP	App connects to internet endpoint or is firewall-denied	Is this VNet linked to the central zone?	`az network private-dns link vnet list`	VNet has no link to the zone
No A record at all	FQDN returns only the public IP, no private	Does the endpoint have a zone group?	`dns-zone-group list` on the PE	Missing/incorrect zone group
On-prem resolves public	Datacenter clients never get the private IP	Does on-prem forward to the resolver inbound IP?	`nslookup` from on-prem	Missing conditional forwarder
Split-brain (random IP)	Same FQDN returns private or public unpredictably	Is there a leftover local zone?	Per-spoke zone inventory	Spoke-local zone + central link both present
All spokes go public at once	Estate-wide private resolution dies	Did the DNS-proxy VNet lose its links?	Links on the NVA/firewall VNet	Proxy VNet rebuilt without zone links
Wrong zone, no resolution	Record written into a zone nobody queries	Is the zone name exactly right for this service?	Zone-name reference table	Regional/special suffix mismatch

Learning objectives

By the end of this article you can:

Explain the full private-endpoint resolution chain (public FQDN → privatelink.* CNAME → private A record → endpoint IP) and name the exact hop at which any failure occurs.
Bind a private endpoint to a centrally-hosted Private DNS zone with a zone group — and explain why manual A records are an anti-pattern at scale.
Host one copy of each privatelink.* zone in the connectivity subscription and project it into every spoke with resolution-only VNet links, in az, Bicep and Terraform.
Enforce auto-binding with an Azure DeployIfNotExists policy at the landing-zone management group, including a remediation task to backfill pre-existing endpoints.
Deploy the Azure DNS Private Resolver with delegated inbound and outbound /28 subnets, and wire conditional forwarding for on-prem→Azure and Azure→on-prem resolution.
Pick the correct Private DNS zone name for any service — including the regional (AKS), multi-zone (Azure Monitor / AMPLS) and oddly-suffixed (Key Vault vaultcore) cases — without guessing.
Diagnose any “the app can’t reach the PaaS resource” ticket as a specific resolution failure and confirm the root cause with one nslookup and one az ... list.

Prerequisites & where this fits

You should already understand the building blocks: a virtual network (VNet) with subnets, VNet peering in a hub-and-spoke topology, what a PaaS firewall (“public network access disabled”) does, and the basics of DNS (A records, CNAMEs, FQDNs, conditional forwarding). You should be comfortable running az in Cloud Shell, reading JSON output, and recognising a private (RFC 1918) address versus a public one. Familiarity with Azure Policy and either Bicep or Terraform lets you take the governance and IaC sections directly to production.

This sits in the Networking & Connectivity track, downstream of the fundamentals and upstream of the landing-zone work. It assumes the VNet mechanics from the Azure Virtual Network basics: subnets, NSGs, peering and the deeper options in the Azure VNet deep dive: every setting. It builds directly on Private Endpoint vs Service Endpoint (why private endpoints, not service endpoints, are the modern default) and Private Link and Private DNS for PaaS (the single-endpoint version of this story). The hybrid half — the Private Resolver and conditional forwarding — is covered standalone in Azure DNS Private Resolver: hybrid conditional forwarding. It slots into the Azure landing zone: network topology and connectivity design, and the governance section leans on Azure Policy as code. When the PaaS firewall denies a public IP, the symptom often surfaces in Troubleshooting storage 403s: firewall, private endpoint, RBAC, SAS.

A quick map of who owns what during a resolution incident, so you escalate to the right team fast:

Layer	What lives here	Who usually owns it	Failure classes it can cause
Application / SDK	The hard-coded public FQDN, connection string	App / dev team	None directly — it always uses the public name
Spoke VNet + endpoint	The private endpoint NIC, `snet-pe`, the zone group	Spoke / workload team	Missing zone group, unlinked VNet
Central Private DNS	The one copy of each zone, all VNet links, DINE policy	Connectivity / platform	Missing link, wrong zone name, orphaned records
DNS-proxy NVA (if any)	Firewall doing DNS proxy for the spokes	Network / security	Estate-wide failure if its VNet loses links
DNS Private Resolver	Inbound/outbound endpoints, forwarding rulesets	Connectivity / platform	On-prem cannot resolve into Azure
On-prem DNS	Conditional forwarders to the inbound endpoint	On-prem AD / infra	On-prem resolves public; reverse path broken

Core concepts

Six mental models make every later decision obvious.

The name is the whole problem — the IP is trivial. A private endpoint always has a private IP the moment it is created. Your app never asks for that IP directly; it asks for mystorageacct.blob.core.windows.net, because that name is in the SDK default, the connection string and the server certificate’s SAN. DNS is the only thing standing between “uses the private endpoint” and “uses the public internet.” Every failure mode in this article is a variation of the client could not see the right private record.

Microsoft pre-builds half the chain; you host the other half. Public Azure DNS already returns a CNAME from the public FQDN to a privatelink.* name — mystorageacct.blob.core.windows.net → mystorageacct.privatelink.blob.core.windows.net. That privatelink.* name resolves to nothing public. Your job is to host the privatelink.blob.core.windows.net Private DNS zone with an A record pointing the endpoint’s privatelink name at its private IP. If the client can see that zone, it follows the CNAME and gets the private IP. If it cannot, the chain dead-ends and the resolver falls back to the public A record.

The default resolver consults every linked zone automatically. Azure’s wire-server resolver lives at the magic, non-routable address 168.63.129.16. Any VM using default DNS in a VNet that is linked to a Private DNS zone will automatically have that zone consulted — no forwarders, no custom DNS, no resolver. So for in-Azure clients, “make this spoke resolve the private IP” reduces to “link this spoke’s VNet to the zone.” That single fact is the backbone of the whole design.

Centralize the zone, link it many times. A Private DNS zone is a global resource that can be linked to up to 1,000 VNets. You therefore host exactly one copy of privatelink.blob.core.windows.net in the connectivity subscription and create one VNet link per spoke. The alternative — one zone per spoke — multiplies every record by the spoke count and creates N independent places for drift. One zone, many links, is non-negotiable at scale.

Zone groups, not hand-written records. A Private DNS zone group is a child object of the private endpoint that tells Azure to manage the A record’s whole lifecycle — write it on creation, update it if the private IP changes, delete it when the endpoint is deleted. A manual A record is correct for exactly as long as nobody redeploys; the first re-creation orphans it. Zone groups can point at a zone in a different subscription, which is precisely how the spoke owns the endpoint while the connectivity subscription owns the zone.

On-prem lives in a different DNS universe. The address 168.63.129.16 is reachable only from inside an Azure VNet. An on-prem server has no path to it, so it can never benefit from a linked zone the way an Azure VM does. To bridge the gap you deploy the Azure DNS Private Resolver (or, historically, DNS-forwarder VMs) which exposes an inbound endpoint — a real private IP, reachable over ExpressRoute/VPN — that on-prem DNS can conditionally forward to. Resolution then happens inside Azure, where the linked zones are visible.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side.

Concept	One-line definition	Where it lives	Why it matters to resolution
Private endpoint (PE)	A private NIC + IP for a PaaS resource	Spoke subnet (`snet-pe`)	The thing whose name must resolve private
Subresource / `group-id`	Which sub-service the PE targets (blob, vault…)	On the PE connection	Wrong one → wrong/no zone, no record
*`privatelink.` zone**	The Private DNS zone holding private A records	Connectivity subscription	The half of the chain you host
A record	`privatelink` name → private IP	In the zone	The answer the client actually needs
Zone group	Child object that manages the A record lifecycle	On the PE	Auto-writes/updates/deletes the record
VNet link	Binds a zone to a VNet so it’s consulted	On the zone	If absent, that VNet resolves public
168.63.129.16	Azure’s in-VNet default resolver	Per-VNet (virtual)	Auto-consults all linked zones
DINE policy	DeployIfNotExists policy auto-creating zone groups	Management group	Enforces binding without human action
DNS Private Resolver	Managed in-Azure DNS forwarder service	Hub VNet	Lets on-prem resolve into Azure
Inbound endpoint	Private IP on-prem forwards to	Hub, delegated `/28`	The bridge from on-prem into Azure DNS
Outbound endpoint	Source for queries Azure sends out	Hub, delegated `/28`	Lets Azure resolve on-prem names
Forwarding ruleset	Domain → on-prem DNS server mappings	Hub	Azure→on-prem conditional forwarding
DNS-proxy NVA	Firewall/NVA resolving on the spokes’ behalf	Hub VNet	If unlinked, breaks all resolution

Resolution paths side by side

The three client populations each take a different route to the same private IP. Knowing which path a given client uses tells you immediately which control to check when it breaks:

Client	Default DNS it uses	How it reaches the zone	Extra config needed	Breaks if…
Spoke VM (Azure default DNS)	168.63.129.16	VNet is linked to the central zone	A VNet link	The link is missing
Spoke VM (custom DNS → NVA proxy)	NVA in hub	NVA forwards to 168.63.129.16 in its VNet	NVA VNet linked to all zones	The proxy VNet loses its links
On-prem host	On-prem DNS	Conditional forwarder → resolver inbound IP	Inbound endpoint + forwarder	The forwarder is missing/wrong

Why private endpoints break name resolution

A private endpoint projects a NIC for a PaaS resource into your VNet and gives it a private IP. The problem is the name. Your application still connects to the public FQDN — mystorageacct.blob.core.windows.net, myvault.vault.azure.net — because that name is baked into SDKs, connection strings, and certificates.

Resolve that public FQDN with no private DNS in place and you get the public IP. Traffic leaves over the internet path (or is blocked by the firewall on the PaaS resource) and the private endpoint is never used. The fix is the chain Azure builds for you:

mystorageacct.blob.core.windows.net
  -> CNAME mystorageacct.privatelink.blob.core.windows.net
       -> A 10.x.x.x   (only resolvable if you host the privatelink zone)

Microsoft’s public DNS already returns the privatelink.* CNAME. Your job is to host the privatelink.blob.core.windows.net Private DNS zone with an A record pointing the resource’s private endpoint at its private IP. If the client can see that zone, it follows the CNAME to your private A record. If it cannot, it falls through to the public A record. Every failure mode in this article is a variation of “the client could not see the right zone.”

It helps to be precise about what resolves to what at each step, and what a broken answer looks like at that step:

Step	Name being resolved	Healthy answer	Broken answer	What the broken answer means
1	`acct.blob.core.windows.net`	CNAME → `acct.privatelink.blob…`	A → public IP directly	Some custom DNS isn’t returning the CNAME
2	`acct.privatelink.blob.core.windows.net`	A → `10.x.x.x` (private)	NXDOMAIN / public fallthrough	The privatelink zone isn’t visible to this client
3	The private IP `10.x.x.x`	Reachable on 443 over the VNet	Timeout / reset	Routing/NSG/peering issue, not DNS
—	(end) the connection	PaaS sees a private-link source	PaaS firewall denies public source	Resolution returned the public IP

Two reading notes: a public IP in the answer is always a DNS problem (steps 1–2); a private IP that times out is always a network problem (step 3). Never debug them with the same tool — nslookup settles steps 1–2, Test-NetConnection/nc -vz settles step 3.

Zone groups beat manual A records

You can create the A record by hand. Do not. A Private DNS zone group binds a private endpoint to one or more zones so Azure manages the A record lifecycle for you: it writes the record on creation, updates it if the IP changes, and deletes it when the endpoint is deleted. Manual records rot the moment someone redeploys.

# Create the private endpoint (storage blob example)
az network private-endpoint create \
  --name pe-stblob-app1 \
  --resource-group rg-app1 \
  --vnet-name vnet-spoke-app1 --subnet snet-pe \
  --private-connection-resource-id "$STORAGE_ID" \
  --group-id blob \
  --connection-name conn-stblob-app1

# Bind it to the centralized zone via a zone group
az network private-endpoint dns-zone-group create \
  --resource-group rg-app1 \
  --endpoint-name pe-stblob-app1 \
  --name default \
  --private-dns-zone "$ZONE_ID_BLOB" \
  --zone-name privatelink-blob

The --private-dns-zone here is a full resource ID. That ID can point at a zone in a different subscription — which is exactly how we centralize. The spoke owns the endpoint; the connectivity subscription owns the zone.

The --group-id (sometimes subresource) is per service: blob, file, table, queue, dfs for storage; vault for Key Vault; sqlServer for Azure SQL; mariadbServer, postgresqlServer, and so on. One resource can need several — a storage account using blob and file needs two endpoints (or one endpoint with two group IDs) and two zone groups.

Here is the same binding in Bicep, where the zone group is a child resource of the endpoint:

resource pe 'Microsoft.Network/privateEndpoints@2023-11-01' = {
  name: 'pe-stblob-app1'
  location: location
  properties: {
    subnet: { id: peSubnetId }
    privateLinkServiceConnections: [ {
      name: 'conn-stblob-app1'
      properties: {
        privateLinkServiceId: storageId
        groupIds: [ 'blob' ]
      }
    } ]
  }
}

resource zoneGroup 'Microsoft.Network/privateEndpoints/privateDnsZoneGroups@2023-11-01' = {
  parent: pe
  name: 'default'
  properties: {
    privateDnsZoneConfigs: [ {
      name: 'privatelink-blob'
      properties: { privateDnsZoneId: centralBlobZoneId } // ID in the connectivity sub
    } ]
  }
}

The private-endpoint create call has a small set of options that decide whether resolution can even work — get any of these wrong and the zone group has nothing correct to bind to:

Option	Values	Default	When to change	Trade-off / gotcha
`--group-id`	Service-specific (`blob`, `vault`, `sqlServer`…)	none (required)	Always set per subresource	Wrong value → wrong/no zone, no record
`--subnet`	A subnet with PE network policies disabled	none (required)	Dedicate a `snet-pe` per spoke	Forgetting `--disable-private-endpoint-network-policies` blocks NIC placement
`--private-connection-resource-id`	The target PaaS resource ID	none (required)	The resource you’re fronting	Must be a resource that supports Private Link
`--connection-name`	Free text	derived	Name it after the consumer	Shows in the approval list on the target
Approval mode	Auto / Manual	Auto (same tenant)	Manual for cross-tenant/3rd-party	Auto-approval can expose a resource unintentionally
`--ip-config` (static IP)	Dynamic / static	Dynamic	Pin an IP for firewall rules	Static IPs need lifecycle management
`--edge-zone`	An edge zone name	none	Edge/low-latency placement	Niche; most PEs are regional

Why the zone group wins, attribute by attribute against the hand-written record:

Concern	Manual A record	Zone group (managed)
Created when	You remember to	Automatically with the endpoint
IP change (redeploy)	Stale until you fix it	Re-written automatically
Endpoint deleted	Record orphaned	Record deleted automatically
Cross-subscription	You manage RBAC + scripts	Native via the zone resource ID
Multiple subresources	Several records by hand	Multiple configs in one group
Drift risk at scale	High — N hand edits	None — Azure owns the lifecycle
Policy-enforceable	No clean hook	Yes — DINE creates the group

A single zone group can hold multiple zone configs, which is how one endpoint with several subresources stays correct:

Scenario	Endpoints needed	`group-id`(s)	Zone group configs	Zones referenced
Storage, blob only	1	`blob`	1	`privatelink.blob.core.windows.net`
Storage, blob + file	1 (two group-ids) or 2	`blob`, `file`	2	blob zone + file zone
Storage, Data Lake (HNS)	1	`dfs`	1	`privatelink.dfs.core.windows.net`
Key Vault	1	`vault`	1	`privatelink.vaultcore.azure.net`
Azure SQL logical server	1	`sqlServer`	1	`privatelink.database.windows.net`
Cosmos DB (multi-region)	1 + 1/region	`Sql`	1 (+ regional records)	`privatelink.documents.azure.com`
Azure Monitor (AMPLS)	1 (the scope)	`azuremonitor`	5	monitor/oms/ods/agentsvc/blob set

Centralize zones in the connectivity subscription

The anti-pattern is one set of privatelink.* zones per spoke. With forty spokes you would have forty copies of privatelink.blob.core.windows.net, each a separate place for records to drift. Instead, host one copy of each zone in the connectivity (hub) subscription and project it into every spoke with a VNet link.

HUB_RG="rg-connectivity-dns"

# One zone, hosted centrally
az network private-dns zone create \
  --resource-group $HUB_RG \
  --name privatelink.blob.core.windows.net

# Link every VNet that needs to resolve it.
# registration-enabled=false: this is a resolution-only link.
az network private-dns link vnet create \
  --resource-group $HUB_RG \
  --zone-name privatelink.blob.core.windows.net \
  --name link-spoke-app1 \
  --virtual-network "$SPOKE_APP1_VNET_ID" \
  --registration-enabled false

A VNet’s default resolver (168.63.129.16) automatically consults every Private DNS zone linked to that VNet. So a VM in vnet-spoke-app1 querying the blob FQDN walks: public CNAME to privatelink.*, then the linked central zone returns the private A record. No forwarders, no resolver, no custom DNS on the spoke — for VNet-internal clients. Hybrid clients are the resolver section.

Set registration-enabled false on these links. Auto-registration is for VM hostnames in a single VNet; it has no place in a shared privatelink zone and only one link per zone may have it enabled anyway. The distinction matters enough to tabulate:

Link attribute	`registration-enabled = true`	`registration-enabled = false`
Purpose	Auto-register VM hostnames into the zone	Resolution only — read the zone’s records
How many per zone	At most one VNet	Up to 999 more (1,000 links total)
Use for privatelink zones	Never	Always
Writes records?	Yes (VM A records)	No
Used by	A private “VM DNS” zone like `corp.internal`	Every `privatelink.*` zone

The shared-zone topology has hard ceilings you must design against before you hit spoke 200:

Resource	Limit (per subscription/zone)	What it constrains	Mitigation when approached
VNet links per Private DNS zone	1,000	How many spokes can resolve one zone	Split estates by region/zone copy; second connectivity sub
Private DNS zones per subscription	1,000	How many distinct `privatelink.*` zones	Rarely hit — there are ~40 service zones
Record sets per zone	25,000	How many endpoints share one zone	Comfortable for hundreds of PEs
Records per record set	20	Multi-IP A records (rarely needed)	One PE = one IP normally
Links with registration enabled	1 per zone	Auto-registration scope	Don’t enable it on privatelink zones
Private endpoints per VNet	High (thousands)	PE density per spoke	Spread across spokes by workload

In Terraform the central-plus-many-links pattern is just a for_each:

locals {
  privatelink_zones = [
    "privatelink.blob.core.windows.net",
    "privatelink.file.core.windows.net",
    "privatelink.vaultcore.azure.net",
    "privatelink.database.windows.net",
  ]
}

resource "azurerm_private_dns_zone" "zones" {
  for_each            = toset(local.privatelink_zones)
  name                = each.value
  resource_group_name = azurerm_resource_group.dns.name
}

# Cartesian product: every zone linked to every spoke VNet
resource "azurerm_private_dns_zone_virtual_network_link" "links" {
  for_each = {
    for pair in setproduct(local.privatelink_zones, keys(var.spoke_vnets)) :
    "${pair[0]}|${pair[1]}" => { zone = pair[0], vnet = pair[1] }
  }
  name                  = "link-${each.value.vnet}"
  resource_group_name   = azurerm_resource_group.dns.name
  private_dns_zone_name = azurerm_private_dns_zone.zones[each.value.zone].name
  virtual_network_id    = var.spoke_vnets[each.value.vnet]
  registration_enabled  = false
}

The centralized model wins decisively over per-spoke zones on every axis that matters at scale:

Axis	Per-spoke zones (anti-pattern)	Centralized zone + links (this design)
Copies of each zone	One per spoke (×40)	Exactly one
Places a record can drift	N (one per spoke)	One
New-spoke onboarding	Create zones + endpoints + records	Add links (one `for_each` iteration)
Cross-team RBAC	Each spoke owns DNS	Connectivity owns DNS centrally
Policy enforcement	Hard (target floats)	Easy (one zone ID per service)
On-prem resolver target	Ambiguous	One authoritative set of zones
Failure blast radius	Per spoke	Centralized — but link discipline critical

Policy-enforced private DNS

Manual zone groups do not scale across teams. The moment a spoke owner creates an endpoint and forgets the zone group, you have a silent public-resolution bug. Azure Policy with a deployIfNotExists (DINE) effect closes the gap: it watches for new private endpoints and auto-creates the zone group pointing at your central zone.

Microsoft ships built-in DINE policies — search for “Configure private endpoints … to use private DNS zones” in the policy catalog (Microsoft.Authorization/policyDefinitions). There is a service-specific one (e.g. for Blob, Key Vault, SQL) and you typically assign them as an initiative at the landing-zone management group, each parameterized with the central zone’s resource ID.

The shape of the rule, so you know what it is doing:

{
  "if": {
    "allOf": [
      { "field": "type", "equals": "Microsoft.Network/privateEndpoints" },
      {
        "count": {
          "field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*].groupIds[*]",
          "where": { "field": "...groupIds[*]", "equals": "blob" }
        },
        "greaterOrEquals": 1
      }
    ]
  },
  "then": {
    "effect": "deployIfNotExists",
    "details": {
      "type": "Microsoft.Network/privateEndpoints/privateDnsZoneGroups",
      "roleDefinitionIds": [
        "/providers/Microsoft.Authorization/roleDefinitions/4d97b98b-1d4f-4787-a291-c67834d212e7"
      ],
      "deployment": { "properties": { "..." : "ARM template that creates the zone group" } }
    }
  }
}

The role definition ID above is Network Contributor — the DINE managed identity needs it (plus Private DNS Zone Contributor on the zone) to write the zone group and record. Two operational notes: DINE only acts on new resources, so run a remediation task to backfill endpoints that predate the assignment; and because the policy hardcodes the central zone ID, every endpoint of that service across every spoke lands in the same zone automatically. That is the whole point — governance, not goodwill.

Choosing the effect for your private-endpoint governance is itself a decision; here is what each gives you:

Effect	What it does	Acts on existing?	When to use
DeployIfNotExists	Auto-creates the missing zone group	With remediation task	The default — make every PE correct automatically
AuditIfNotExists	Flags PEs lacking a zone group, changes nothing	Yes (reports)	Discovery / pre-enforcement phase
Deny	Blocks PE creation that violates a condition	No (prevention)	Forbid PEs in non-approved subnets/subs
Modify	Adds/updates a property (e.g. a tag)	With remediation	Tagging, not zone-group creation
Disabled	Turns the rule off	n/a	Temporarily during migration

The two roles the DINE identity needs, and exactly why:

Role	Scope to grant	What it lets the policy do
Network Contributor	The endpoint’s subscription / RG	Create the `privateDnsZoneGroups` child object
Private DNS Zone Contributor	The central zone (connectivity sub)	Write the A record into the zone

DINE remediation has a predictable lifecycle; knowing each stage stops you from “why didn’t it fix it?” confusion:

Stage	Trigger	What happens	What you do
Assignment created	You assign the initiative	A managed identity is created	Grant it the two roles above
New endpoint appears	Spoke owner creates a PE	DINE evaluates and deploys the zone group	Nothing — it’s automatic
Existing endpoints	(predate the assignment)	Marked non-compliant, not fixed	Create a remediation task to backfill
Compliance drift	Someone deletes a zone group	Flagged non-compliant on next scan	Re-run remediation or let next eval fix
Reporting	Continuous	Compliance % in Policy blade	Alert on non-compliant count > 0

Create the remediation task to backfill the estate:

az policy remediation create \
  --name remediate-pe-blob-dns \
  --policy-assignment "$ASSIGNMENT_ID" \
  --resource-discovery-mode ReEvaluateCompliance

On-prem and hybrid resolution with the Private Resolver

VNet clients are solved. On-prem clients are not: a server in your datacenter querying mystorageacct.blob.core.windows.net hits its own DNS, gets the public CNAME, and has no way to reach 168.63.129.16 — that address is non-routable outside Azure. You need an in-Azure resolver that on-prem can forward to.

The modern answer is the Azure DNS Private Resolver, a managed service (no DNS VMs to patch). Deploy it in the hub with an inbound endpoint (an IP on-prem forwards to) and an outbound endpoint (for queries Azure sends back out to on-prem).

RESOLVER_RG="rg-connectivity-dns"

az dns-resolver create \
  --name dnspr-hub \
  --resource-group $RESOLVER_RG \
  --location eastus2 \
  --id "$HUB_VNET_ID"

# Inbound: gets a private IP in a dedicated /28 subnet delegated to the resolver
az dns-resolver inbound-endpoint create \
  --dns-resolver-name dnspr-hub \
  --resource-group $RESOLVER_RG \
  --name inbound \
  --location eastus2 \
  --ip-configurations "[{private-ip-allocation-method:Dynamic,subnet:{id:$INBOUND_SUBNET_ID}}]"

# Outbound: needs its own delegated /28 subnet
az dns-resolver outbound-endpoint create \
  --dns-resolver-name dnspr-hub \
  --resource-group $RESOLVER_RG \
  --name outbound \
  --location eastus2 \
  --subnet "$OUTBOUND_SUBNET_ID"

Both endpoints require dedicated subnets delegated to Microsoft.Network/dnsResolvers, minimum /28. Plan IP space for this in the hub up front. The resolver’s pieces, and what each is for:

Resolver component	What it is	Subnet requirement	Direction	Who talks to it
DNS Private Resolver	The managed service object	Lives in the hub VNet	—	Container for the endpoints
Inbound endpoint	A private IP that accepts queries	Delegated `/28`	On-prem → Azure	On-prem DNS conditional forwarders
Outbound endpoint	Source for queries leaving Azure	Delegated `/28`	Azure → on-prem	Forwarding rulesets attach here
Forwarding ruleset	Domain → target-DNS mappings	n/a (logical)	Azure → on-prem	Linked to VNets that should obey it
Ruleset VNet link	Applies a ruleset to a VNet	n/a	—	The VNets whose queries it governs

The Private Resolver vs the legacy DNS-forwarder-VM approach — why the managed service wins for new builds:

Dimension	DNS Private Resolver (managed)	DNS forwarder VMs (legacy)
Patching / OS upkeep	None (PaaS)	You patch Windows/BIND
High availability	Built-in, zone-resilient	You build it (2+ VMs, LB)
Scaling under QPS	Managed (high QPS/endpoint)	Size and scale VMs yourself
Conditional forwarding	Native forwarding rulesets	BIND/Windows config files
Cost model	Per endpoint-hour + queries	VM compute + management time
Subnet need	Two delegated `/28`s	A subnet for the VMs
When still chosen	New builds, almost always	Legacy estates, exotic DNS needs

Conditional forwarding rulesets (Azure to on-prem)

When an Azure workload needs to resolve an on-prem name (db01.corp.local), the resolver’s outbound endpoint sends it to your on-prem DNS via a forwarding ruleset. Each rule maps a domain to target DNS servers; link the ruleset to the VNets that should obey it.

az dns-resolver forwarding-ruleset create \
  --name frs-onprem \
  --resource-group $RESOLVER_RG \
  --location eastus2 \
  --outbound-endpoints "[{id:$OUTBOUND_ENDPOINT_ID}]"

az dns-resolver forwarding-rule create \
  --ruleset-name frs-onprem \
  --resource-group $RESOLVER_RG \
  --name rule-corp-local \
  --domain-name "corp.local." \
  --forwarding-rule-state Enabled \
  --target-dns-servers "[{ip-address:10.50.0.10,port:53},{ip-address:10.50.0.11,port:53}]"

az dns-resolver vnet-link create \
  --ruleset-name frs-onprem \
  --resource-group $RESOLVER_RG \
  --name link-hub \
  --virtual-network "$HUB_VNET_ID"

The trailing dot on corp.local. is mandatory — these are fully qualified domain names. A forwarding rule has a small, exact set of fields; getting any of them wrong fails silently:

Rule field	Example	Meaning	Common mistake
`domain-name`	`corp.local.`	The suffix this rule matches	Missing the trailing dot
`forwarding-rule-state`	`Enabled`	Whether the rule is active	Left `Disabled` after testing
`target-dns-servers`	`10.50.0.10:53`	On-prem DNS to forward to	Pointing at a public resolver
Ruleset → outbound endpoint	`$OUTBOUND_ENDPOINT_ID`	Which egress the queries use	Forgetting to attach the endpoint
Ruleset → VNet link	hub + spokes	Which VNets obey the ruleset	Linking the ruleset to no VNet

ExpressRoute / VPN inbound resolution (on-prem to Azure)

This is the reverse direction and the one teams forget. For on-prem clients to resolve private endpoints, point a conditional forwarder on your on-prem DNS at the resolver’s inbound endpoint IP, for the public DNS suffixes of the PaaS services.

The subtlety: you forward the public zone names (blob.core.windows.net, vaultcore.azure.net), not the privatelink.* names. On-prem asks for mystorageacct.blob.core.windows.net; the inbound endpoint resolves it inside Azure, where 168.63.129.16 follows the CNAME into your linked privatelink zone and returns the private IP. On-prem never references privatelink directly.

On Windows Server DNS, one forwarder per suffix:

$inbound = "10.10.0.4"   # resolver inbound endpoint IP

"blob.core.windows.net",
"file.core.windows.net",
"vaultcore.azure.net",
"database.windows.net" | ForEach-Object {
  Add-DnsServerConditionalForwarderZone `
    -Name $_ `
    -MasterServers $inbound `
    -ReplicationScope "Forest"
}

Traffic to the inbound endpoint rides your existing ExpressRoute private peering or VPN — the resolver IP is a normal private address in the hub, reachable over the same routes your workloads already use. No public exposure. The direction matrix below is the single thing most teams get backwards — which name you forward, where, and why:

Direction	Configured where	Forward what	Forward to	Net effect
On-prem → Azure PaaS	On-prem DNS (Windows/BIND)	Public suffix (`blob.core.windows.net`)	Resolver inbound IP	On-prem gets the private endpoint IP
Azure → on-prem	Resolver outbound + ruleset	On-prem suffix (`corp.local`)	On-prem DNS servers	Azure workloads resolve internal names
In-Azure → Azure PaaS	Nothing (automatic)	—	168.63.129.16 + linked zone	Spoke VMs already resolve private
On-prem → on-prem	On-prem DNS (unchanged)	—	On-prem DNS	Untouched by this design

A common rollout error is forwarding the wrong name; here is the exact right/wrong list:

You forward (on-prem)	Correct?	Why
`blob.core.windows.net` → inbound IP	Yes	Azure follows the CNAME into the linked privatelink zone
`privatelink.blob.core.windows.net` → inbound IP	No	The privatelink name is internal plumbing; on-prem never asks for it
`*.azure.com` → inbound IP	No	Far too broad; hijacks unrelated resolution
`vaultcore.azure.net` → inbound IP	Yes	Key Vault’s public suffix is `vaultcore`, not `vault`
`core.windows.net` → inbound IP	Risky	Catches every storage service; prefer per-service suffixes

Regional zones and the long zone-name list

The most common rollout bug is using the wrong zone name. Several services use regional or non-obvious zone names, and a few use a different suffix entirely. Get the name wrong and the zone group silently writes records nowhere useful. Reference values you will use constantly:

Service	Subresource (`group-id`)	Private DNS zone name
Blob storage	`blob`	`privatelink.blob.core.windows.net`
File storage	`file`	`privatelink.file.core.windows.net`
Queue storage	`queue`	`privatelink.queue.core.windows.net`
Table storage	`table`	`privatelink.table.core.windows.net`
Data Lake Gen2 (HNS)	`dfs`	`privatelink.dfs.core.windows.net`
Key Vault	`vault`	`privatelink.vaultcore.azure.net`
Azure SQL DB	`sqlServer`	`privatelink.database.windows.net`
SQL Managed Instance	`managedInstance`	`privatelink.{dnszone}.database.windows.net`
Cosmos DB (SQL/Core)	`Sql`	`privatelink.documents.azure.com`
Cosmos DB (MongoDB)	`MongoDB`	`privatelink.mongo.cosmos.azure.com`
PostgreSQL Flexible	`postgresqlServer`	`privatelink.postgres.database.azure.com`
App Service / Functions	`sites`	`privatelink.azurewebsites.net`
Container Registry	`registry`	`privatelink.azurecr.io` (+ regional data zone)
Event Hubs / Service Bus	`namespace`	`privatelink.servicebus.windows.net`
AKS API server	`management`	`privatelink.<region>.azmk8s.io`
Azure Monitor (AMPLS)	`azuremonitor`	`privatelink.monitor.azure.com` (+ companion set)
Azure Cache for Redis	`redisCache`	`privatelink.redis.cache.windows.net`
Azure AI Search	`searchService`	`privatelink.search.windows.net`
Azure OpenAI / AI Services	`account`	`privatelink.openai.azure.com` / `cognitiveservices.azure.com`
Azure App Configuration	`configurationStores`	`privatelink.azconfig.io`
Azure Web PubSub / SignalR	`webpubsub` / `signalr`	`privatelink.webpubsub.azure.com` / `service.signalr.net`

The four traps in that list deserve their own table, because each has burned a real rollout:

Trap	What people assume	The reality	Consequence if wrong
Key Vault suffix	`privatelink.vault.azure.net`	It is `vaultcore.azure.net`	Zone never matches; always public
AKS regional zone	One global zone	`privatelink.<region>.azmk8s.io` (per region)	Wrong region → no API-server resolution
Azure Monitor (AMPLS)	One `monitor` zone	A set: `monitor`, `oms`, `ods`, `agentsvc`, plus `blob`	Partial telemetry; agents fail silently
Container Registry	One `azurecr.io` zone	Main zone plus a `<region>.data.azurecr.io` zone for image pulls	Logins work, pulls fail

When you are unsure, the authoritative list is Microsoft’s “Azure Private Endpoint DNS configuration” doc — treat it as the source of truth and do not guess. Sovereign and Government clouds use entirely different suffixes (*.core.usgovcloudapi.net, *.vaultcore.usgovcloudapi.net, etc.). If you run in those clouds, derive names from that cloud’s documentation. The commercial-vs-sovereign suffix shift is total:

Service	Commercial (public)	US Government cloud
Blob	`privatelink.blob.core.windows.net`	`privatelink.blob.core.usgovcloudapi.net`
Key Vault	`privatelink.vaultcore.azure.net`	`privatelink.vaultcore.usgovcloudapi.net`
Azure SQL	`privatelink.database.windows.net`	`privatelink.database.usgovcloudapi.net`
App Service	`privatelink.azurewebsites.net`	`privatelink.azurewebsites.us`

Architecture at a glance

Follow a single request left to right and the whole design falls into place. A spoke VM (top-left) runs an application that connects to mystorageacct.blob.core.windows.net — the public FQDN, because that is what its SDK and connection string contain. It asks its default resolver, which in any VNet is Azure’s wire server at 168.63.129.16 in the resolution layer. Public Azure DNS returns the CNAME to mystorageacct.privatelink.blob.core.windows.net, and because this spoke’s VNet is linked to the central privatelink.blob.core.windows.net zone in the connectivity subscription, the resolver follows that CNAME straight into the central zone and reads the A record — 10.x.x.4, the private IP of the private endpoint NIC sitting in the spoke’s snet-pe. The app then opens TCP 443 to that private IP, and the storage account (with public access disabled) accepts the connection because it arrives over Private Link. No byte of that traffic ever touched the public internet, and the only thing that made it private was a DNS answer.

The on-prem host (bottom-left) takes a longer path to the same answer: it cannot see 168.63.129.16, so its on-prem DNS conditionally forwards the public suffix to the resolver’s inbound endpoint, the query is resolved inside Azure where the linked zones are visible, and the private IP comes back over ExpressRoute or VPN. The numbered badges mark exactly where this breaks in production. Badge 1 is a spoke whose VNet was never linked — it falls through to the public IP. Badge 2 is the estate-killer: a DNS-proxy NVA in the hub that all spokes forward through, whose own VNet lost its zone links on a rebuild, so every spoke goes public at once. Badge 3 is an endpoint created without a zone group, so no A record is ever written. Badge 4 is on-prem missing its conditional forwarder, resolving public for everything. Badge 5 is split-brain — a leftover spoke-local zone or an orphaned manual record returning a stale, recycled IP. The legend narrates each as symptom · confirm · fix; read it as the field guide for the rest of this article.

Real-world scenario

Northwind Mutual, a regulated insurer, ran a Palo Alto NVA in the hub as DNS proxy for all forty spokes. Every spoke’s VNet DNS pointed at the firewall’s internal IP; the firewall, in turn, forwarded to Azure’s default resolver. AKS private clusters resolved fine, storage worked, Key Vault worked — for eight months. Then, during a routine firewall version upgrade, the network team rebuilt the firewall’s VNet from a clean Bicep template to pick up a new subnet layout. Within minutes, every workload in every spoke started failing: storage SDKs threw connection errors, the AKS API server became unreachable from pods, and the on-call channel lit up with “is storage down?” across six unrelated product teams at once.

It was not storage. It was DNS, and the blast radius was total because of the proxy topology. The firewall’s own VNet had been linked to the privatelink zones manually, by an az network private-dns link vnet create someone ran during the original migration — a command that lived in nobody’s IaC. The clean rebuild recreated the firewall VNet with no zone links. So the chain collapsed exactly here: spokes forwarded DNS to the firewall (fine), the firewall forwarded to 168.63.129.16 in the firewall’s own VNet (fine), but that VNet now had zero privatelink zones linked — so the resolver had nothing to follow the CNAME into, and returned the public A record for everything. Forty spokes, hundreds of endpoints, public IPs everywhere, simultaneously. Because the storage and SQL firewalls denied the public source IPs, every connection failed closed. The incident ran ninety minutes before someone ran nslookup mystorageacct.blob.core.windows.net from a spoke VM, saw a public address, and realised it was resolution, not the services.

The fix was twofold. First, move the firewall VNet’s zone links into the same for_each that links the spokes, so the DNS-proxy VNet is never special-cased:

locals {
  dns_resolving_vnets = merge(var.spoke_vnets, {
    "hub-firewall" = var.firewall_vnet_id
  })
}

That single merge feeds the existing setproduct link resource, guaranteeing the proxy VNet gets every zone the spokes get — forever, automatically, on every apply. Second, they added an audit-style Azure Policy on Microsoft.Network/privateDnsZones/virtualNetworkLinks checked against an allowlist, so any link created or deleted outside the pipeline raises a non-compliant flag within minutes, and an alert fires on the count. The deeper lesson Northwind took away: when spokes resolve through a DNS-proxy NVA, that NVA’s VNet is the single point of failure for all private resolution — it must carry the full zone-link set, that set belongs in code, and the one resource you can least afford to manage by hand is the one most likely to be created with a quick az command during a migration nobody documents.

Advantages and disadvantages

The centralized hub-and-spoke private-DNS design is the right default at scale, but it concentrates risk that you must consciously manage.

Advantages	Disadvantages
One copy of each zone — no per-spoke drift	The central zone set is a shared dependency for the whole estate
New spoke onboards by adding links (one IaC iteration)	Mis-link or unlink the DNS-proxy VNet and everything breaks at once
Policy auto-binds every endpoint — no human step	DINE remediation needs RBAC and a backfill task; existing PEs aren’t auto-fixed
Connectivity team owns DNS; spokes just create PEs	Cross-subscription RBAC adds a setup step
On-prem resolves into Azure via one managed resolver	Resolver needs two delegated `/28`s planned in hub IP space up front
Failures are deterministic and fast to confirm (`nslookup`)	Resolution failures succeed with the wrong answer — silent until something denies the public IP
Scales to ~1,000 VNet links per zone	Beyond that you split the estate or add a second zone copy
Works identically for storage, KV, SQL, AKS, AMPLS	Each service’s zone name must be exactly right (regional/special suffixes)

When the central model is decisively right: any landing zone with more than a handful of spokes, any regulated workload, any hybrid estate. When you might deviate: a single, isolated VNet with two endpoints and no on-prem clients can host its own zone locally without ceremony — though even then, doing it the central way costs nothing extra and future-proofs the growth. The one thing you never do at scale is the per-spoke-zone anti-pattern; it feels simpler on day one and becomes an unmanageable drift surface by spoke ten.

Hands-on lab

A self-contained walk-through: create a storage account with public access disabled, a spoke VNet, a private endpoint, the central zone, the link, and a zone group — then prove resolution returns a private IP. Run it in a sandbox subscription; the storage account and a small VNet cost pennies for an hour, and teardown removes everything.

1. Variables and resource group.

LOC=eastus2
RG=rg-pe-lab
az group create -n $RG -l $LOC
ACCT="stpelab$RANDOM"

2. Create a VNet with a dedicated private-endpoint subnet.

az network vnet create -g $RG -n vnet-lab --address-prefixes 10.20.0.0/16 \
  --subnet-name snet-pe --subnet-prefixes 10.20.1.0/24
# Disable PE network policies so the endpoint NIC can be placed
az network vnet subnet update -g $RG --vnet-name vnet-lab -n snet-pe \
  --disable-private-endpoint-network-policies true

3. Create a storage account and disable public access.

az storage account create -g $RG -n $ACCT -l $LOC --sku Standard_LRS --kind StorageV2
az storage account update -g $RG -n $ACCT --public-network-access Disabled
STORAGE_ID=$(az storage account show -g $RG -n $ACCT --query id -o tsv)

4. Create the private endpoint for the blob subresource.

az network private-endpoint create -g $RG -n pe-blob \
  --vnet-name vnet-lab --subnet snet-pe \
  --private-connection-resource-id "$STORAGE_ID" \
  --group-id blob --connection-name conn-blob

5. Create the central Private DNS zone and link the VNet. (In production this zone is in the connectivity subscription; here it’s in the same RG for simplicity.)

az network private-dns zone create -g $RG -n privatelink.blob.core.windows.net
az network private-dns link vnet create -g $RG \
  --zone-name privatelink.blob.core.windows.net \
  --name link-lab --virtual-network vnet-lab --registration-enabled false
ZONE_ID=$(az network private-dns zone show -g $RG \
  -n privatelink.blob.core.windows.net --query id -o tsv)

6. Bind the endpoint to the zone with a zone group (lets Azure write and own the A record).

az network private-endpoint dns-zone-group create -g $RG \
  --endpoint-name pe-blob --name default \
  --private-dns-zone "$ZONE_ID" --zone-name privatelink-blob

7. Verify the A record was written automatically.

az network private-dns record-set a list -g $RG \
  --zone-name privatelink.blob.core.windows.net -o table
# Expect an A record for the account name pointing at 10.20.1.x

8. Prove resolution from inside the VNet. Create a tiny VM in the spoke (or use an existing one) and resolve the public FQDN — it must return the private IP:

az vm create -g $RG -n vm-test --image Ubuntu2204 --vnet-name vnet-lab \
  --subnet snet-pe --admin-username azureuser --generate-ssh-keys --size Standard_B1s
az vm run-command invoke -g $RG -n vm-test --command-id RunShellScript \
  --scripts "nslookup ${ACCT}.blob.core.windows.net"
# Expect: canonical name = ...privatelink.blob.core.windows.net ; Address: 10.20.1.x

A public IP here means the zone isn’t linked or the zone group is missing — re-check steps 5 and 6.

9. Teardown.

az group delete -n $RG --yes --no-wait

The lab maps one-to-one onto the production pattern; the only differences at scale are where the zone lives (connectivity subscription), how many links exist (one per spoke via for_each), and who creates the zone group (the DINE policy, not you).

Common mistakes & troubleshooting

Resolution failures are binary and fast to diagnose once you know the playbook. This is the table to keep open during an incident: the symptom you observe, the root cause, the exact command to confirm it, and the fix. Read the prose under it for the non-obvious ones.

#	Symptom	Root cause	Confirm (exact command / path)	Fix
1	Spoke `nslookup` returns a public IP	This VNet has no link to the central zone	`az network private-dns link vnet list -g $HUB_RG --zone-name <zone>` (spoke absent)	Add a resolution-only link (`--registration-enabled false`)
2	FQDN returns public; zone is linked	Endpoint has no zone group, so no A record	`az network private-endpoint dns-zone-group list -g <rg> --endpoint-name <pe>` (empty)	Create the zone group, or let DINE + remediation backfill
3	Record exists but points at a wrong/old IP	Manual A record orphaned after redeploy	`az network private-dns record-set a list -g $HUB_RG --zone-name <zone>` vs PE IP	Delete the manual record; bind a zone group instead
4	All spokes resolve public at once	DNS-proxy NVA’s VNet lost its zone links	`az network private-dns link vnet list` for the firewall VNet (none)	Re-link the proxy VNet; put it in the spokes’ `for_each`
5	On-prem returns public; spoke returns private	On-prem conditional forwarder missing/wrong	`nslookup <fqdn>` from on-prem; check forwarder targets	Forward the public suffix to the resolver inbound IP
6	On-prem forwarder set, still public	Forwarder points at `privatelink.*` not the public suffix	Inspect on-prem forwarder zone names	Forward `blob.core.windows.net`, not `privatelink.blob…`
7	Resolution random (private or public)	Split-brain: spoke-local zone + central link both present	List zones in the spoke RG/sub for a duplicate	Delete the spoke-local zone; keep only the central one
8	Record written but never used	Wrong zone name for the service (regional/special)	Compare zone name to the reference table	Recreate in the correct zone (`vaultcore`, `<region>.azmk8s.io`…)
9	Key Vault resolves public despite a zone	Used `vault.azure.net` instead of `vaultcore.azure.net`	`az network private-dns zone list` for the exact name	Create `privatelink.vaultcore.azure.net`; rebind
10	AMPLS telemetry partially missing	Only `monitor` zone created, not the full set	Check for `oms`/`ods`/`agentsvc`/`blob` companion zones	Create all five AMPLS zones and link them
11	New endpoint not auto-bound	DINE identity lacks RBAC on the zone	Policy compliance shows the deploy failed	Grant Private DNS Zone Contributor on the zone
12	Old endpoints non-compliant, unfixed	DINE only acts on new resources	Policy assignment shows non-compliant existing PEs	Run a remediation task to backfill
13	Private IP resolves but connection times out	Not DNS — NSG/UDR/peering/firewall blocks 443	`nc -vz <privateIP> 443` from the spoke	Fix routing/NSG (see the VNet troubleshooting article)
14	Storage 403 after going private	Resolution returned public; PaaS firewall denied it	`nslookup` shows public IP → it’s resolution	Fix the DNS link/zone group, not the storage ACL

The non-obvious failures, expanded

The estate-wide failure (row 4) is the one to fear. When spokes use a hub NVA as DNS proxy, that NVA’s VNet must itself be linked to every privatelink zone, because the resolver only consults zones linked to the VNet the query is resolved in. The proxy resolves in its own VNet, so the proxy’s VNet — not the spokes’ — needs the links. Confirm by listing links for the firewall VNet, not the spoke. The fix is to treat the proxy VNet as just another resolving VNet in your IaC, never as a special case (see the real-world scenario).

On-prem forwards the public name, never privatelink (rows 5–6). The whole point of the inbound endpoint is to resolve inside Azure, where 168.63.129.16 will follow the CNAME into the linked privatelink zone. If you forward privatelink.blob.core.windows.net from on-prem, you’ve forwarded the internal plumbing name that on-prem should never reference — and resolution fails. Forward the public suffix (blob.core.windows.net) to the inbound endpoint IP, full stop.

Split-brain is non-deterministic and maddening (row 7). If a spoke still hosts its own privatelink.* zone and is linked to the central one, lookups are answered by whichever the resolver consults first — so the same FQDN returns private sometimes and public other times, often differing between VMs. Pick the central zone, delete the local copies, and add the audit policy from the scenario so a stray local zone is flagged fast.

Orphaned records resolve to recycled IPs (row 3). When a zone group is bypassed or a resource is force-deleted, hand-written A records linger and may resolve to an IP that’s since been reassigned to a different endpoint — a silent cross-wiring. Periodically diff record-sets against live endpoints, and link lists against live VNets; orphans are a real outage source at scale.

The decision table for “is this even a DNS problem?” — run this first, before you touch any zone:

If you see…	It’s probably…	Do this
`nslookup` returns a public IP	A DNS/link/zone-group problem	Work the resolution playbook above
`nslookup` returns a private IP but connection fails	A network problem (NSG/UDR/peering)	`nc -vz <ip> 443`; fix routing, not DNS
Private from spoke, public from on-prem	A conditional-forwarder gap	Fix the on-prem forwarder → inbound IP
Private sometimes, public other times	Split-brain (duplicate zones)	Delete the spoke-local zone
Everything public, everywhere, suddenly	The proxy VNet lost its links	Re-link the NVA/firewall VNet
Public for one service only	Wrong zone name for that service	Check the regional/special-suffix table

Verify

Resolution is binary and easy to test. From a VM in a spoke, the FQDN must resolve to a private (RFC 1918) address:

nslookup mystorageacct.blob.core.windows.net
# Expect:
#   ...canonical name = mystorageacct.privatelink.blob.core.windows.net
#   Address: 10.x.x.x        <- private. Public IP here = broken.

Confirm the central zone actually holds the record, and audit which VNets are linked:

# Record exists and points at the endpoint's private IP?
az network private-dns record-set a list \
  --resource-group $HUB_RG \
  --zone-name privatelink.blob.core.windows.net -o table

# Which VNets can resolve this zone?
az network private-dns link vnet list \
  --resource-group $HUB_RG \
  --zone-name privatelink.blob.core.windows.net \
  --query "[].{name:name, vnet:virtualNetwork.id, reg:registrationEnabled}" -o table

# Endpoint approved and connected?
az network private-endpoint show \
  --name pe-stblob-app1 --resource-group rg-app1 \
  --query "privateLinkServiceConnections[0].privateLinkServiceConnectionState" -o json

From on-prem, the same nslookup against the public FQDN must also return the private IP — proving the conditional forwarder reaches the inbound endpoint. If on-prem returns the public IP but the spoke returns private, the forwarder or the route to the inbound endpoint is the problem, not the zone. The four-quadrant truth table tells you instantly which half of the design is broken:

Spoke result	On-prem result	Verdict	Where to look
Private	Private	Healthy end to end	Nothing — you’re done
Private	Public	In-Azure good; on-prem forwarder broken	On-prem conditional forwarder → inbound IP
Public	Public	Central zone/link broken for everyone	Zone exists? VNet linked? Proxy VNet linked?
Public	Private	Rare; spoke link missing but on-prem path resolves	Add the spoke VNet link

Best practices

Production-grade rules distilled from running this at landing-zone scale:

#	Practice	Why it matters
1	Host one copy of each `privatelink.*` zone in the connectivity subscription	Eliminates per-spoke drift; one source of truth
2	Link every resolving VNet with `registration-enabled false`	Resolution-only; avoids the one-registration-link limit
3	Bind every endpoint with a zone group, never a manual A record	Azure owns the record lifecycle; no orphans
4	Enforce binding with a DINE policy at the landing-zone management group	Removes the human “remember the zone group” step
5	Run a remediation task after assigning the policy	DINE doesn’t fix pre-existing endpoints automatically
6	Put the DNS-proxy/firewall VNet in the same link `for_each` as spokes	Prevents the estate-wide failure on a rebuild
7	Manage all zone links in IaC; audit-policy any link created out-of-band	The one-off `az` link is the classic SPOF
8	Deploy the DNS Private Resolver (not VMs) with two delegated `/28`s	Managed, HA, no patching; plan IP space up front
9	Forward the public suffix from on-prem to the inbound IP — never `privatelink.*`	Lets Azure follow the CNAME internally
10	Verify regional/special zone names (KV `vaultcore`, AKS region, AMPLS set) against the docs	Wrong name writes records nowhere useful
11	Schedule an orphan/link audit (diff records vs endpoints, links vs VNets)	Surfaces drift before a user files a ticket
12	Test resolution from both a spoke VM and an on-prem host after every change	The four-quadrant table catches half-broken states

Security notes

Private endpoints exist for security; the DNS layer is where that security quietly succeeds or fails.

Control	What to do	Why
Disable public network access on the PaaS resource	`--public-network-access Disabled` on storage/KV/SQL	Without this, the public endpoint stays reachable even with a PE
Least-privilege on the zone	Grant DINE identity only Private DNS Zone Contributor on the zone	Avoid broad Network Contributor at subscription scope
RBAC the connectivity subscription tightly	Only the platform team writes zones/links	DNS is now a shared, estate-wide control plane
Audit zone links as code	Deny/audit links created outside the pipeline	A rogue or deleted link silently breaks/leaks resolution
Approve PE connections deliberately	Use manual approval for cross-tenant/3rd-party PEs	Auto-approval can expose a resource you didn’t intend
Keep the resolver inbound IP private	Reachable only over ExpressRoute/VPN	No public exposure of the DNS bridge
*Don’t forward `privatelink.` from on-prem**	Forward only public suffixes	Prevents leaking internal naming and broken resolution
Monitor for public-IP regressions	Alert if a known PE FQDN ever resolves public	Catches a dropped link before data takes the public path

The subtle security failure mode: a resolution bug doesn’t open a port — it sends your “private” PaaS traffic out the public path, where the PaaS firewall denies it (fail-closed, the good case) or, if public access was never disabled, silently allows it (fail-open, the data-exfiltration case). Disabling public network access turns every DNS regression into a loud failure instead of a silent leak.

Cost & sizing

The DNS layer itself is cheap; the cost conversation is mostly about the private endpoints and the resolver. Rough figures (verify current pricing for your region):

Item	Unit	Rough cost (USD)	Rough cost (INR)	Notes
Private DNS zone	Per zone / month	~$0.50	~₹42	~40 service zones max — negligible
Private DNS queries	Per million queries	~$0.40	~₹33	Most estates are well within noise
VNet link	Per link	Free	Free	Link freely — no per-link charge
Private endpoint	Per endpoint / hour	~$0.01/hr (~$7.30/mo)	~₹600/mo	The real cost driver at hundreds of PEs
PE data processing	Per GB	~$0.01/GB	~₹0.83/GB	Inbound + outbound through the PE
DNS Private Resolver endpoint	Per endpoint / hour	~$0.10/hr each	~₹8/hr each	Two endpoints (in + out) in the hub
DNS Private Resolver queries	Per million	~$0.40	~₹33	Only on-prem-bound/forwarded queries

Sizing guidance, by estate scale:

Estate size	Private endpoints	Zones	VNet links	Resolver needed?	Dominant cost
Single workload	2–10	2–4	1–2	No (in-Azure only)	The endpoints
Small landing zone	20–80	5–10	5–15	If hybrid	The endpoints
Large landing zone	200–800	10–20	40–200	Yes (hybrid)	The endpoints + resolver
Multi-region	800+	per-region copies	up to 1,000/zone	Yes, per region	Endpoints + regional resolvers

The cost levers worth knowing: VNet links are free, so never economize on linking; the private endpoints dominate the bill, so consolidate where a single endpoint with multiple subresources suffices; and the resolver’s two endpoints are a fixed ~$1.40/day in the hub regardless of estate size — a rounding error against hundreds of endpoints. There is no free tier for private endpoints, but the lab in this article runs for well under a dollar in an hour and tears down cleanly.

Interview & exam questions

Mapped to AZ-700 (Designing and Implementing Azure Networking), AZ-305 (Designing Azure Infrastructure Solutions) and AZ-104.

1. Why does a private endpoint require Private DNS, when it already has a private IP? Because the application connects by the public FQDN (baked into SDKs, connection strings and certificates), not the IP. Without a Private DNS zone holding the private A record, that FQDN resolves to the public IP and the endpoint is bypassed. DNS is the only thing that redirects the name to the private IP.

2. What is the full resolution chain for mystorageacct.blob.core.windows.net with a private endpoint? Public Azure DNS returns a CNAME to mystorageacct.privatelink.blob.core.windows.net; that name is resolved by your hosted privatelink.blob.core.windows.net zone, which holds an A record to the endpoint’s private IP. If the client can’t see that zone, it falls through to the public A record.

3. Why use a zone group instead of creating the A record manually? A zone group makes Azure manage the record’s entire lifecycle — write on creation, update on IP change, delete on endpoint deletion. Manual records become stale or orphaned the moment a resource is redeployed, and they’re not policy-enforceable.

4. How do you avoid forty copies of the same privatelink zone across forty spokes? Host one copy in the connectivity subscription and create a VNet link per spoke (registration-enabled false). A VNet’s default resolver consults every linked zone, so one zone serves all spokes with no duplication.

5. What does registration-enabled false mean and why is it required here? It makes the link resolution-only — the VNet can read the zone’s records but doesn’t auto-register VM hostnames into it. Only one link per zone may have registration enabled, and auto-registration has no place in a shared privatelink zone.

6. How do you enforce that every new endpoint gets the correct zone group automatically? Assign a DeployIfNotExists (DINE) Azure Policy at the landing-zone management group, parameterized with the central zone’s resource ID. It auto-creates the zone group for new endpoints. Pre-existing endpoints need a remediation task.

7. Two RBAC roles the DINE managed identity needs, and why? Network Contributor (to create the privateDnsZoneGroups child object on the endpoint) and Private DNS Zone Contributor on the central zone (to write the A record). Missing either makes the policy deploy fail.

8. How do on-prem clients resolve a private endpoint, given 168.63.129.16 is unreachable from on-prem? Deploy the Azure DNS Private Resolver with an inbound endpoint, and configure on-prem conditional forwarders to send the public suffix (e.g. blob.core.windows.net) to that inbound IP. Resolution then happens inside Azure where the linked zones are visible.

9. Which name do you forward from on-prem — the public suffix or the privatelink name? The public suffix. Azure follows the CNAME into the linked privatelink zone itself; on-prem should never reference the privatelink.* name directly.

10. A storage account behind a private endpoint suddenly returns 403 to an app. First check? nslookup the FQDN from the app’s host. A public IP means resolution broke (missing link or zone group) and the storage firewall denied the public source — fix the DNS, not the storage ACL. A private IP that times out is a network (NSG/UDR) problem instead.

11. Why is a DNS-proxy NVA in the hub a single point of failure for private resolution? Because the resolver consults only the zones linked to the VNet where the query is resolved — and with a proxy, that’s the NVA’s VNet, not the spokes’. If the NVA’s VNet loses its zone links, every spoke that forwards through it resolves public at once.

12. Name two services with non-obvious Private DNS zone names. Key Vault uses privatelink.vaultcore.azure.net (not vault.azure.net); AKS private clusters use a regional privatelink.<region>.azmk8s.io; Azure Monitor Private Link Scope needs a set of zones (monitor, oms, ods, agentsvc, blob).

Quick check

With no Private DNS zone in place, what does mystorageacct.blob.core.windows.net resolve to from a spoke VM, and what happens to the traffic?
You created a private endpoint but the FQDN still resolves to a public IP, even though the zone is linked. What single object is most likely missing?
Why must on-prem conditional forwarders target the public suffix (e.g. vaultcore.azure.net) and not privatelink.vaultcore.azure.net?
Your estate uses a hub firewall as DNS proxy. After a firewall VNet rebuild, every spoke resolves public. What broke?
A spoke VM resolves the private IP, but the app still can’t connect on 443. Is this a DNS problem? How do you confirm?

Answers

The public IP. The traffic leaves over the internet path (or is denied by the PaaS firewall) and the private endpoint is bypassed — DNS is the only thing that would have redirected the name to the private IP.
The zone group on the private endpoint. Without it, no A record is written into the zone, so even a linked zone has nothing to return — resolution falls through to the public record.
Because Azure’s resolver follows the public name’s CNAME into the linked privatelink zone itself; on-prem should resolve the public name inside Azure and never reference the internal privatelink.* plumbing. Forwarding privatelink.* breaks the chain.
The firewall (proxy) VNet lost its zone links. The resolver consults zones linked to the VNet where it resolves — the proxy’s VNet — and a clean rebuild dropped manually-created links, so the resolver had no privatelink zones to follow the CNAME into.
No — a private IP in the answer means DNS is correct. Confirm with nc -vz <privateIP> 443 from the spoke; a failure points at NSG/UDR/peering/firewall, not the zone.

Glossary

Term	Definition
Private endpoint	A NIC with a private IP that projects a PaaS resource into your VNet via Private Link.
Subresource / group-id	The specific sub-service a private endpoint targets (e.g. `blob`, `vault`, `sqlServer`).
Private Link	The Azure backbone path that carries traffic to a private endpoint without traversing the internet.
*`privatelink.` zone**	The Private DNS zone you host that contains the private A records for endpoints of a service.
Private DNS zone	A zone hosted in Azure (not internet-published) resolved by VNet default DNS when linked.
A record	The DNS record mapping the `privatelink` FQDN to the endpoint’s private IP.
Zone group	A child object of a private endpoint that makes Azure manage the A record’s lifecycle.
VNet link	The binding that makes a Private DNS zone resolvable from a given virtual network.
Registration-enabled	A link flag; `true` auto-registers VM hostnames (one per zone), `false` is resolution-only.
168.63.129.16	Azure’s per-VNet wire-server resolver that auto-consults all linked Private DNS zones.
DeployIfNotExists (DINE)	An Azure Policy effect that auto-deploys a missing resource (here, the zone group).
Remediation task	A Policy job that brings pre-existing non-compliant resources into compliance.
DNS Private Resolver	A managed Azure service that forwards DNS between on-prem and Azure without DNS VMs.
Inbound endpoint	A private IP on the resolver that on-prem DNS conditionally forwards queries to.
Outbound endpoint	The resolver’s egress point for queries Azure forwards out to on-prem DNS.
Forwarding ruleset	A set of domain→target-DNS rules applied (via VNet links) to govern Azure→on-prem resolution.
Conditional forwarder	An on-prem DNS rule sending a specific suffix to a chosen DNS server (here, the inbound IP).
Split-brain DNS	Non-deterministic resolution caused by a name existing in two zones a client can both see.
DNS-proxy NVA	A firewall/appliance resolving DNS on the spokes’ behalf; its VNet must carry all zone links.

Next steps

Go deeper on the hybrid half with Azure DNS Private Resolver: hybrid conditional forwarding — rulesets, both directions, and on-prem integration.
Cement the why-private-endpoints decision with Private Endpoint vs Service Endpoint and the single-endpoint pattern in Private Link and Private DNS for PaaS.
Place this design inside the broader topology with Azure landing zone: network topology and connectivity and the Azure VNet deep dive: every setting.
Operationalize the governance with Azure Policy as code so a new spoke is fully resolvable the moment it is vended.
When resolution looks fine but connectivity fails, work Troubleshooting VNet connectivity: NSG, UDR, effective routes, Network Watcher and Troubleshooting storage 403s: firewall, private endpoint, RBAC, SAS.