Azure Lesson 55 of 137

Private Endpoints and Private DNS at Scale: A Hub-and-Spoke Resolution Architecture

One private endpoint is easy. Three hundred of them across forty spokes, with on-prem clients that also need to resolve them, is an architecture problem. A private endpoint projects a private NIC for a PaaS resource — a storage account, a Key Vault, an Azure SQL server — into your virtual network and gives it a private IP. The catch is never the IP; it is the name. Your application still connects to the public FQDN baked into its SDK, connection string and TLS certificate, and unless your DNS quietly rewrites that name to the private IP, the traffic leaves over the internet path (or is rejected by the PaaS firewall) and the private endpoint you paid for is never used. Get the DNS design wrong early and you inherit zone sprawl, split-brain resolution, and a steady drip of “the app can’t reach the storage account” tickets that are never an application bug.

This is how to centralize it correctly the first time. We treat private-endpoint name resolution as one mechanism — public FQDN → privatelink.* CNAME → a private A record you host → the endpoint’s private IP — and then we make that mechanism work for hundreds of endpoints, dozens of spokes, and two client populations (in-Azure and on-prem) without copying a single zone more than once. You will host one copy of each privatelink.* zone in the connectivity subscription, project it into every spoke with a VNet link, let Azure Policy auto-bind every new endpoint to it, deploy the Azure DNS Private Resolver so on-prem can forward into Azure, and wire conditional forwarders in both directions. Every configuration carries an az command and a Bicep or Terraform snippet, and because this is a reference you will keep open during a rollout or an incident, the zone names, the failure modes, the limits and the resolution playbook are all laid out as scannable tables.

By the end you will stop guessing at 02:14 when a spoke suddenly resolves a public IP. You will know whether a VNet lost its link, a zone group was never created, a DNS-proxy NVA was rebuilt without its zone links, an on-prem forwarder points at the wrong place, or a leftover local zone is causing split-brain — and you will have the exact nslookup and az ... list to confirm which within ninety seconds. Read the prose once; keep the tables open the rest of the time.

What problem this solves

Private endpoints exist to keep PaaS traffic off the public internet — for compliance (“no public exposure of customer data”), for egress control (everything traverses the hub firewall), and to eliminate the data-exfiltration surface of a public storage endpoint. The networking is the easy 20%. The DNS is the 80% that silently fails, because resolution failures don’t throw — they succeed, returning the wrong (public) answer, and the application connects to the internet endpoint it was always going to connect to. Nothing errors until the PaaS firewall denies the public IP, or until an auditor notices traffic on the public path, or until a regional outage takes the public endpoint down while everyone assumed they were private.

What breaks without a deliberate design: forty spokes each grow their own copy of privatelink.blob.core.windows.net, records drift independently, and a redeploy in spoke 12 silently bypasses its zone group so that one account resolves public while thirty-nine resolve private. On-prem clients — which can never reach Azure’s internal resolver at 168.63.129.16 — get the public IP for everything and nobody notices until a partner integration fails. A DNS-proxy firewall in the hub becomes an undocumented single point of failure for all private resolution, and the day someone rebuilds it from a clean template, every spoke resolves public at the same instant.

Who hits this: every regulated or security-conscious team running PaaS behind private endpoints at landing-zone scale — banks, insurers, healthcare, government. It bites hardest where there are many spokes (record drift, missing links), hybrid connectivity (on-prem can’t see Azure DNS), a DNS-proxy NVA (single point of failure), and services with non-obvious zone names (Key Vault’s vaultcore, AKS’s regional zone, Azure Monitor’s set of five zones). The fix is never “add another zone per spoke” — it is “host one copy centrally, link it everywhere, and let policy enforce it.”

To frame the whole field before the deep dive, here is every failure class this article covers, the question it forces, and the first place to look:

Failure class What actually happens First question to ask First place to look Most common single cause
Spoke resolves public IP App connects to internet endpoint or is firewall-denied Is this VNet linked to the central zone? az network private-dns link vnet list VNet has no link to the zone
No A record at all FQDN returns only the public IP, no private Does the endpoint have a zone group? dns-zone-group list on the PE Missing/incorrect zone group
On-prem resolves public Datacenter clients never get the private IP Does on-prem forward to the resolver inbound IP? nslookup from on-prem Missing conditional forwarder
Split-brain (random IP) Same FQDN returns private or public unpredictably Is there a leftover local zone? Per-spoke zone inventory Spoke-local zone + central link both present
All spokes go public at once Estate-wide private resolution dies Did the DNS-proxy VNet lose its links? Links on the NVA/firewall VNet Proxy VNet rebuilt without zone links
Wrong zone, no resolution Record written into a zone nobody queries Is the zone name exactly right for this service? Zone-name reference table Regional/special suffix mismatch

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should already understand the building blocks: a virtual network (VNet) with subnets, VNet peering in a hub-and-spoke topology, what a PaaS firewall (“public network access disabled”) does, and the basics of DNS (A records, CNAMEs, FQDNs, conditional forwarding). You should be comfortable running az in Cloud Shell, reading JSON output, and recognising a private (RFC 1918) address versus a public one. Familiarity with Azure Policy and either Bicep or Terraform lets you take the governance and IaC sections directly to production.

This sits in the Networking & Connectivity track, downstream of the fundamentals and upstream of the landing-zone work. It assumes the VNet mechanics from the Azure Virtual Network basics: subnets, NSGs, peering and the deeper options in the Azure VNet deep dive: every setting. It builds directly on Private Endpoint vs Service Endpoint (why private endpoints, not service endpoints, are the modern default) and Private Link and Private DNS for PaaS (the single-endpoint version of this story). The hybrid half — the Private Resolver and conditional forwarding — is covered standalone in Azure DNS Private Resolver: hybrid conditional forwarding. It slots into the Azure landing zone: network topology and connectivity design, and the governance section leans on Azure Policy as code. When the PaaS firewall denies a public IP, the symptom often surfaces in Troubleshooting storage 403s: firewall, private endpoint, RBAC, SAS.

A quick map of who owns what during a resolution incident, so you escalate to the right team fast:

Layer What lives here Who usually owns it Failure classes it can cause
Application / SDK The hard-coded public FQDN, connection string App / dev team None directly — it always uses the public name
Spoke VNet + endpoint The private endpoint NIC, snet-pe, the zone group Spoke / workload team Missing zone group, unlinked VNet
Central Private DNS The one copy of each zone, all VNet links, DINE policy Connectivity / platform Missing link, wrong zone name, orphaned records
DNS-proxy NVA (if any) Firewall doing DNS proxy for the spokes Network / security Estate-wide failure if its VNet loses links
DNS Private Resolver Inbound/outbound endpoints, forwarding rulesets Connectivity / platform On-prem cannot resolve into Azure
On-prem DNS Conditional forwarders to the inbound endpoint On-prem AD / infra On-prem resolves public; reverse path broken

Core concepts

Six mental models make every later decision obvious.

The name is the whole problem — the IP is trivial. A private endpoint always has a private IP the moment it is created. Your app never asks for that IP directly; it asks for mystorageacct.blob.core.windows.net, because that name is in the SDK default, the connection string and the server certificate’s SAN. DNS is the only thing standing between “uses the private endpoint” and “uses the public internet.” Every failure mode in this article is a variation of the client could not see the right private record.

Microsoft pre-builds half the chain; you host the other half. Public Azure DNS already returns a CNAME from the public FQDN to a privatelink.* name — mystorageacct.blob.core.windows.netmystorageacct.privatelink.blob.core.windows.net. That privatelink.* name resolves to nothing public. Your job is to host the privatelink.blob.core.windows.net Private DNS zone with an A record pointing the endpoint’s privatelink name at its private IP. If the client can see that zone, it follows the CNAME and gets the private IP. If it cannot, the chain dead-ends and the resolver falls back to the public A record.

The default resolver consults every linked zone automatically. Azure’s wire-server resolver lives at the magic, non-routable address 168.63.129.16. Any VM using default DNS in a VNet that is linked to a Private DNS zone will automatically have that zone consulted — no forwarders, no custom DNS, no resolver. So for in-Azure clients, “make this spoke resolve the private IP” reduces to “link this spoke’s VNet to the zone.” That single fact is the backbone of the whole design.

Centralize the zone, link it many times. A Private DNS zone is a global resource that can be linked to up to 1,000 VNets. You therefore host exactly one copy of privatelink.blob.core.windows.net in the connectivity subscription and create one VNet link per spoke. The alternative — one zone per spoke — multiplies every record by the spoke count and creates N independent places for drift. One zone, many links, is non-negotiable at scale.

Zone groups, not hand-written records. A Private DNS zone group is a child object of the private endpoint that tells Azure to manage the A record’s whole lifecycle — write it on creation, update it if the private IP changes, delete it when the endpoint is deleted. A manual A record is correct for exactly as long as nobody redeploys; the first re-creation orphans it. Zone groups can point at a zone in a different subscription, which is precisely how the spoke owns the endpoint while the connectivity subscription owns the zone.

On-prem lives in a different DNS universe. The address 168.63.129.16 is reachable only from inside an Azure VNet. An on-prem server has no path to it, so it can never benefit from a linked zone the way an Azure VM does. To bridge the gap you deploy the Azure DNS Private Resolver (or, historically, DNS-forwarder VMs) which exposes an inbound endpoint — a real private IP, reachable over ExpressRoute/VPN — that on-prem DNS can conditionally forward to. Resolution then happens inside Azure, where the linked zones are visible.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side.

Concept One-line definition Where it lives Why it matters to resolution
Private endpoint (PE) A private NIC + IP for a PaaS resource Spoke subnet (snet-pe) The thing whose name must resolve private
Subresource / group-id Which sub-service the PE targets (blob, vault…) On the PE connection Wrong one → wrong/no zone, no record
privatelink.* zone The Private DNS zone holding private A records Connectivity subscription The half of the chain you host
A record privatelink name → private IP In the zone The answer the client actually needs
Zone group Child object that manages the A record lifecycle On the PE Auto-writes/updates/deletes the record
VNet link Binds a zone to a VNet so it’s consulted On the zone If absent, that VNet resolves public
168.63.129.16 Azure’s in-VNet default resolver Per-VNet (virtual) Auto-consults all linked zones
DINE policy DeployIfNotExists policy auto-creating zone groups Management group Enforces binding without human action
DNS Private Resolver Managed in-Azure DNS forwarder service Hub VNet Lets on-prem resolve into Azure
Inbound endpoint Private IP on-prem forwards to Hub, delegated /28 The bridge from on-prem into Azure DNS
Outbound endpoint Source for queries Azure sends out Hub, delegated /28 Lets Azure resolve on-prem names
Forwarding ruleset Domain → on-prem DNS server mappings Hub Azure→on-prem conditional forwarding
DNS-proxy NVA Firewall/NVA resolving on the spokes’ behalf Hub VNet If unlinked, breaks all resolution

Resolution paths side by side

The three client populations each take a different route to the same private IP. Knowing which path a given client uses tells you immediately which control to check when it breaks:

Client Default DNS it uses How it reaches the zone Extra config needed Breaks if…
Spoke VM (Azure default DNS) 168.63.129.16 VNet is linked to the central zone A VNet link The link is missing
Spoke VM (custom DNS → NVA proxy) NVA in hub NVA forwards to 168.63.129.16 in its VNet NVA VNet linked to all zones The proxy VNet loses its links
On-prem host On-prem DNS Conditional forwarder → resolver inbound IP Inbound endpoint + forwarder The forwarder is missing/wrong

Why private endpoints break name resolution

A private endpoint projects a NIC for a PaaS resource into your VNet and gives it a private IP. The problem is the name. Your application still connects to the public FQDN — mystorageacct.blob.core.windows.net, myvault.vault.azure.net — because that name is baked into SDKs, connection strings, and certificates.

Resolve that public FQDN with no private DNS in place and you get the public IP. Traffic leaves over the internet path (or is blocked by the firewall on the PaaS resource) and the private endpoint is never used. The fix is the chain Azure builds for you:

mystorageacct.blob.core.windows.net
  -> CNAME mystorageacct.privatelink.blob.core.windows.net
       -> A 10.x.x.x   (only resolvable if you host the privatelink zone)

Microsoft’s public DNS already returns the privatelink.* CNAME. Your job is to host the privatelink.blob.core.windows.net Private DNS zone with an A record pointing the resource’s private endpoint at its private IP. If the client can see that zone, it follows the CNAME to your private A record. If it cannot, it falls through to the public A record. Every failure mode in this article is a variation of “the client could not see the right zone.”

It helps to be precise about what resolves to what at each step, and what a broken answer looks like at that step:

Step Name being resolved Healthy answer Broken answer What the broken answer means
1 acct.blob.core.windows.net CNAME → acct.privatelink.blob… A → public IP directly Some custom DNS isn’t returning the CNAME
2 acct.privatelink.blob.core.windows.net A → 10.x.x.x (private) NXDOMAIN / public fallthrough The privatelink zone isn’t visible to this client
3 The private IP 10.x.x.x Reachable on 443 over the VNet Timeout / reset Routing/NSG/peering issue, not DNS
(end) the connection PaaS sees a private-link source PaaS firewall denies public source Resolution returned the public IP

Two reading notes: a public IP in the answer is always a DNS problem (steps 1–2); a private IP that times out is always a network problem (step 3). Never debug them with the same tool — nslookup settles steps 1–2, Test-NetConnection/nc -vz settles step 3.

Zone groups beat manual A records

You can create the A record by hand. Do not. A Private DNS zone group binds a private endpoint to one or more zones so Azure manages the A record lifecycle for you: it writes the record on creation, updates it if the IP changes, and deletes it when the endpoint is deleted. Manual records rot the moment someone redeploys.

# Create the private endpoint (storage blob example)
az network private-endpoint create \
  --name pe-stblob-app1 \
  --resource-group rg-app1 \
  --vnet-name vnet-spoke-app1 --subnet snet-pe \
  --private-connection-resource-id "$STORAGE_ID" \
  --group-id blob \
  --connection-name conn-stblob-app1

# Bind it to the centralized zone via a zone group
az network private-endpoint dns-zone-group create \
  --resource-group rg-app1 \
  --endpoint-name pe-stblob-app1 \
  --name default \
  --private-dns-zone "$ZONE_ID_BLOB" \
  --zone-name privatelink-blob

The --private-dns-zone here is a full resource ID. That ID can point at a zone in a different subscription — which is exactly how we centralize. The spoke owns the endpoint; the connectivity subscription owns the zone.

The --group-id (sometimes subresource) is per service: blob, file, table, queue, dfs for storage; vault for Key Vault; sqlServer for Azure SQL; mariadbServer, postgresqlServer, and so on. One resource can need several — a storage account using blob and file needs two endpoints (or one endpoint with two group IDs) and two zone groups.

Here is the same binding in Bicep, where the zone group is a child resource of the endpoint:

resource pe 'Microsoft.Network/privateEndpoints@2023-11-01' = {
  name: 'pe-stblob-app1'
  location: location
  properties: {
    subnet: { id: peSubnetId }
    privateLinkServiceConnections: [ {
      name: 'conn-stblob-app1'
      properties: {
        privateLinkServiceId: storageId
        groupIds: [ 'blob' ]
      }
    } ]
  }
}

resource zoneGroup 'Microsoft.Network/privateEndpoints/privateDnsZoneGroups@2023-11-01' = {
  parent: pe
  name: 'default'
  properties: {
    privateDnsZoneConfigs: [ {
      name: 'privatelink-blob'
      properties: { privateDnsZoneId: centralBlobZoneId } // ID in the connectivity sub
    } ]
  }
}

The private-endpoint create call has a small set of options that decide whether resolution can even work — get any of these wrong and the zone group has nothing correct to bind to:

Option Values Default When to change Trade-off / gotcha
--group-id Service-specific (blob, vault, sqlServer…) none (required) Always set per subresource Wrong value → wrong/no zone, no record
--subnet A subnet with PE network policies disabled none (required) Dedicate a snet-pe per spoke Forgetting --disable-private-endpoint-network-policies blocks NIC placement
--private-connection-resource-id The target PaaS resource ID none (required) The resource you’re fronting Must be a resource that supports Private Link
--connection-name Free text derived Name it after the consumer Shows in the approval list on the target
Approval mode Auto / Manual Auto (same tenant) Manual for cross-tenant/3rd-party Auto-approval can expose a resource unintentionally
--ip-config (static IP) Dynamic / static Dynamic Pin an IP for firewall rules Static IPs need lifecycle management
--edge-zone An edge zone name none Edge/low-latency placement Niche; most PEs are regional

Why the zone group wins, attribute by attribute against the hand-written record:

Concern Manual A record Zone group (managed)
Created when You remember to Automatically with the endpoint
IP change (redeploy) Stale until you fix it Re-written automatically
Endpoint deleted Record orphaned Record deleted automatically
Cross-subscription You manage RBAC + scripts Native via the zone resource ID
Multiple subresources Several records by hand Multiple configs in one group
Drift risk at scale High — N hand edits None — Azure owns the lifecycle
Policy-enforceable No clean hook Yes — DINE creates the group

A single zone group can hold multiple zone configs, which is how one endpoint with several subresources stays correct:

Scenario Endpoints needed group-id(s) Zone group configs Zones referenced
Storage, blob only 1 blob 1 privatelink.blob.core.windows.net
Storage, blob + file 1 (two group-ids) or 2 blob, file 2 blob zone + file zone
Storage, Data Lake (HNS) 1 dfs 1 privatelink.dfs.core.windows.net
Key Vault 1 vault 1 privatelink.vaultcore.azure.net
Azure SQL logical server 1 sqlServer 1 privatelink.database.windows.net
Cosmos DB (multi-region) 1 + 1/region Sql 1 (+ regional records) privatelink.documents.azure.com
Azure Monitor (AMPLS) 1 (the scope) azuremonitor 5 monitor/oms/ods/agentsvc/blob set

Centralize zones in the connectivity subscription

The anti-pattern is one set of privatelink.* zones per spoke. With forty spokes you would have forty copies of privatelink.blob.core.windows.net, each a separate place for records to drift. Instead, host one copy of each zone in the connectivity (hub) subscription and project it into every spoke with a VNet link.

HUB_RG="rg-connectivity-dns"

# One zone, hosted centrally
az network private-dns zone create \
  --resource-group $HUB_RG \
  --name privatelink.blob.core.windows.net

# Link every VNet that needs to resolve it.
# registration-enabled=false: this is a resolution-only link.
az network private-dns link vnet create \
  --resource-group $HUB_RG \
  --zone-name privatelink.blob.core.windows.net \
  --name link-spoke-app1 \
  --virtual-network "$SPOKE_APP1_VNET_ID" \
  --registration-enabled false

A VNet’s default resolver (168.63.129.16) automatically consults every Private DNS zone linked to that VNet. So a VM in vnet-spoke-app1 querying the blob FQDN walks: public CNAME to privatelink.*, then the linked central zone returns the private A record. No forwarders, no resolver, no custom DNS on the spoke — for VNet-internal clients. Hybrid clients are the resolver section.

Set registration-enabled false on these links. Auto-registration is for VM hostnames in a single VNet; it has no place in a shared privatelink zone and only one link per zone may have it enabled anyway. The distinction matters enough to tabulate:

Link attribute registration-enabled = true registration-enabled = false
Purpose Auto-register VM hostnames into the zone Resolution only — read the zone’s records
How many per zone At most one VNet Up to 999 more (1,000 links total)
Use for privatelink zones Never Always
Writes records? Yes (VM A records) No
Used by A private “VM DNS” zone like corp.internal Every privatelink.* zone

The shared-zone topology has hard ceilings you must design against before you hit spoke 200:

Resource Limit (per subscription/zone) What it constrains Mitigation when approached
VNet links per Private DNS zone 1,000 How many spokes can resolve one zone Split estates by region/zone copy; second connectivity sub
Private DNS zones per subscription 1,000 How many distinct privatelink.* zones Rarely hit — there are ~40 service zones
Record sets per zone 25,000 How many endpoints share one zone Comfortable for hundreds of PEs
Records per record set 20 Multi-IP A records (rarely needed) One PE = one IP normally
Links with registration enabled 1 per zone Auto-registration scope Don’t enable it on privatelink zones
Private endpoints per VNet High (thousands) PE density per spoke Spread across spokes by workload

In Terraform the central-plus-many-links pattern is just a for_each:

locals {
  privatelink_zones = [
    "privatelink.blob.core.windows.net",
    "privatelink.file.core.windows.net",
    "privatelink.vaultcore.azure.net",
    "privatelink.database.windows.net",
  ]
}

resource "azurerm_private_dns_zone" "zones" {
  for_each            = toset(local.privatelink_zones)
  name                = each.value
  resource_group_name = azurerm_resource_group.dns.name
}

# Cartesian product: every zone linked to every spoke VNet
resource "azurerm_private_dns_zone_virtual_network_link" "links" {
  for_each = {
    for pair in setproduct(local.privatelink_zones, keys(var.spoke_vnets)) :
    "${pair[0]}|${pair[1]}" => { zone = pair[0], vnet = pair[1] }
  }
  name                  = "link-${each.value.vnet}"
  resource_group_name   = azurerm_resource_group.dns.name
  private_dns_zone_name = azurerm_private_dns_zone.zones[each.value.zone].name
  virtual_network_id    = var.spoke_vnets[each.value.vnet]
  registration_enabled  = false
}

The centralized model wins decisively over per-spoke zones on every axis that matters at scale:

Axis Per-spoke zones (anti-pattern) Centralized zone + links (this design)
Copies of each zone One per spoke (×40) Exactly one
Places a record can drift N (one per spoke) One
New-spoke onboarding Create zones + endpoints + records Add links (one for_each iteration)
Cross-team RBAC Each spoke owns DNS Connectivity owns DNS centrally
Policy enforcement Hard (target floats) Easy (one zone ID per service)
On-prem resolver target Ambiguous One authoritative set of zones
Failure blast radius Per spoke Centralized — but link discipline critical

Policy-enforced private DNS

Manual zone groups do not scale across teams. The moment a spoke owner creates an endpoint and forgets the zone group, you have a silent public-resolution bug. Azure Policy with a deployIfNotExists (DINE) effect closes the gap: it watches for new private endpoints and auto-creates the zone group pointing at your central zone.

Microsoft ships built-in DINE policies — search for “Configure private endpoints … to use private DNS zones” in the policy catalog (Microsoft.Authorization/policyDefinitions). There is a service-specific one (e.g. for Blob, Key Vault, SQL) and you typically assign them as an initiative at the landing-zone management group, each parameterized with the central zone’s resource ID.

The shape of the rule, so you know what it is doing:

{
  "if": {
    "allOf": [
      { "field": "type", "equals": "Microsoft.Network/privateEndpoints" },
      {
        "count": {
          "field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*].groupIds[*]",
          "where": { "field": "...groupIds[*]", "equals": "blob" }
        },
        "greaterOrEquals": 1
      }
    ]
  },
  "then": {
    "effect": "deployIfNotExists",
    "details": {
      "type": "Microsoft.Network/privateEndpoints/privateDnsZoneGroups",
      "roleDefinitionIds": [
        "/providers/Microsoft.Authorization/roleDefinitions/4d97b98b-1d4f-4787-a291-c67834d212e7"
      ],
      "deployment": { "properties": { "..." : "ARM template that creates the zone group" } }
    }
  }
}

The role definition ID above is Network Contributor — the DINE managed identity needs it (plus Private DNS Zone Contributor on the zone) to write the zone group and record. Two operational notes: DINE only acts on new resources, so run a remediation task to backfill endpoints that predate the assignment; and because the policy hardcodes the central zone ID, every endpoint of that service across every spoke lands in the same zone automatically. That is the whole point — governance, not goodwill.

Choosing the effect for your private-endpoint governance is itself a decision; here is what each gives you:

Effect What it does Acts on existing? When to use
DeployIfNotExists Auto-creates the missing zone group With remediation task The default — make every PE correct automatically
AuditIfNotExists Flags PEs lacking a zone group, changes nothing Yes (reports) Discovery / pre-enforcement phase
Deny Blocks PE creation that violates a condition No (prevention) Forbid PEs in non-approved subnets/subs
Modify Adds/updates a property (e.g. a tag) With remediation Tagging, not zone-group creation
Disabled Turns the rule off n/a Temporarily during migration

The two roles the DINE identity needs, and exactly why:

Role Scope to grant What it lets the policy do
Network Contributor The endpoint’s subscription / RG Create the privateDnsZoneGroups child object
Private DNS Zone Contributor The central zone (connectivity sub) Write the A record into the zone

DINE remediation has a predictable lifecycle; knowing each stage stops you from “why didn’t it fix it?” confusion:

Stage Trigger What happens What you do
Assignment created You assign the initiative A managed identity is created Grant it the two roles above
New endpoint appears Spoke owner creates a PE DINE evaluates and deploys the zone group Nothing — it’s automatic
Existing endpoints (predate the assignment) Marked non-compliant, not fixed Create a remediation task to backfill
Compliance drift Someone deletes a zone group Flagged non-compliant on next scan Re-run remediation or let next eval fix
Reporting Continuous Compliance % in Policy blade Alert on non-compliant count > 0

Create the remediation task to backfill the estate:

az policy remediation create \
  --name remediate-pe-blob-dns \
  --policy-assignment "$ASSIGNMENT_ID" \
  --resource-discovery-mode ReEvaluateCompliance

On-prem and hybrid resolution with the Private Resolver

VNet clients are solved. On-prem clients are not: a server in your datacenter querying mystorageacct.blob.core.windows.net hits its own DNS, gets the public CNAME, and has no way to reach 168.63.129.16 — that address is non-routable outside Azure. You need an in-Azure resolver that on-prem can forward to.

The modern answer is the Azure DNS Private Resolver, a managed service (no DNS VMs to patch). Deploy it in the hub with an inbound endpoint (an IP on-prem forwards to) and an outbound endpoint (for queries Azure sends back out to on-prem).

RESOLVER_RG="rg-connectivity-dns"

az dns-resolver create \
  --name dnspr-hub \
  --resource-group $RESOLVER_RG \
  --location eastus2 \
  --id "$HUB_VNET_ID"

# Inbound: gets a private IP in a dedicated /28 subnet delegated to the resolver
az dns-resolver inbound-endpoint create \
  --dns-resolver-name dnspr-hub \
  --resource-group $RESOLVER_RG \
  --name inbound \
  --location eastus2 \
  --ip-configurations "[{private-ip-allocation-method:Dynamic,subnet:{id:$INBOUND_SUBNET_ID}}]"

# Outbound: needs its own delegated /28 subnet
az dns-resolver outbound-endpoint create \
  --dns-resolver-name dnspr-hub \
  --resource-group $RESOLVER_RG \
  --name outbound \
  --location eastus2 \
  --subnet "$OUTBOUND_SUBNET_ID"

Both endpoints require dedicated subnets delegated to Microsoft.Network/dnsResolvers, minimum /28. Plan IP space for this in the hub up front. The resolver’s pieces, and what each is for:

Resolver component What it is Subnet requirement Direction Who talks to it
DNS Private Resolver The managed service object Lives in the hub VNet Container for the endpoints
Inbound endpoint A private IP that accepts queries Delegated /28 On-prem → Azure On-prem DNS conditional forwarders
Outbound endpoint Source for queries leaving Azure Delegated /28 Azure → on-prem Forwarding rulesets attach here
Forwarding ruleset Domain → target-DNS mappings n/a (logical) Azure → on-prem Linked to VNets that should obey it
Ruleset VNet link Applies a ruleset to a VNet n/a The VNets whose queries it governs

The Private Resolver vs the legacy DNS-forwarder-VM approach — why the managed service wins for new builds:

Dimension DNS Private Resolver (managed) DNS forwarder VMs (legacy)
Patching / OS upkeep None (PaaS) You patch Windows/BIND
High availability Built-in, zone-resilient You build it (2+ VMs, LB)
Scaling under QPS Managed (high QPS/endpoint) Size and scale VMs yourself
Conditional forwarding Native forwarding rulesets BIND/Windows config files
Cost model Per endpoint-hour + queries VM compute + management time
Subnet need Two delegated /28s A subnet for the VMs
When still chosen New builds, almost always Legacy estates, exotic DNS needs

Conditional forwarding rulesets (Azure to on-prem)

When an Azure workload needs to resolve an on-prem name (db01.corp.local), the resolver’s outbound endpoint sends it to your on-prem DNS via a forwarding ruleset. Each rule maps a domain to target DNS servers; link the ruleset to the VNets that should obey it.

az dns-resolver forwarding-ruleset create \
  --name frs-onprem \
  --resource-group $RESOLVER_RG \
  --location eastus2 \
  --outbound-endpoints "[{id:$OUTBOUND_ENDPOINT_ID}]"

az dns-resolver forwarding-rule create \
  --ruleset-name frs-onprem \
  --resource-group $RESOLVER_RG \
  --name rule-corp-local \
  --domain-name "corp.local." \
  --forwarding-rule-state Enabled \
  --target-dns-servers "[{ip-address:10.50.0.10,port:53},{ip-address:10.50.0.11,port:53}]"

az dns-resolver vnet-link create \
  --ruleset-name frs-onprem \
  --resource-group $RESOLVER_RG \
  --name link-hub \
  --virtual-network "$HUB_VNET_ID"

The trailing dot on corp.local. is mandatory — these are fully qualified domain names. A forwarding rule has a small, exact set of fields; getting any of them wrong fails silently:

Rule field Example Meaning Common mistake
domain-name corp.local. The suffix this rule matches Missing the trailing dot
forwarding-rule-state Enabled Whether the rule is active Left Disabled after testing
target-dns-servers 10.50.0.10:53 On-prem DNS to forward to Pointing at a public resolver
Ruleset → outbound endpoint $OUTBOUND_ENDPOINT_ID Which egress the queries use Forgetting to attach the endpoint
Ruleset → VNet link hub + spokes Which VNets obey the ruleset Linking the ruleset to no VNet

ExpressRoute / VPN inbound resolution (on-prem to Azure)

This is the reverse direction and the one teams forget. For on-prem clients to resolve private endpoints, point a conditional forwarder on your on-prem DNS at the resolver’s inbound endpoint IP, for the public DNS suffixes of the PaaS services.

The subtlety: you forward the public zone names (blob.core.windows.net, vaultcore.azure.net), not the privatelink.* names. On-prem asks for mystorageacct.blob.core.windows.net; the inbound endpoint resolves it inside Azure, where 168.63.129.16 follows the CNAME into your linked privatelink zone and returns the private IP. On-prem never references privatelink directly.

On Windows Server DNS, one forwarder per suffix:

$inbound = "10.10.0.4"   # resolver inbound endpoint IP

"blob.core.windows.net",
"file.core.windows.net",
"vaultcore.azure.net",
"database.windows.net" | ForEach-Object {
  Add-DnsServerConditionalForwarderZone `
    -Name $_ `
    -MasterServers $inbound `
    -ReplicationScope "Forest"
}

Traffic to the inbound endpoint rides your existing ExpressRoute private peering or VPN — the resolver IP is a normal private address in the hub, reachable over the same routes your workloads already use. No public exposure. The direction matrix below is the single thing most teams get backwards — which name you forward, where, and why:

Direction Configured where Forward what Forward to Net effect
On-prem → Azure PaaS On-prem DNS (Windows/BIND) Public suffix (blob.core.windows.net) Resolver inbound IP On-prem gets the private endpoint IP
Azure → on-prem Resolver outbound + ruleset On-prem suffix (corp.local) On-prem DNS servers Azure workloads resolve internal names
In-Azure → Azure PaaS Nothing (automatic) 168.63.129.16 + linked zone Spoke VMs already resolve private
On-prem → on-prem On-prem DNS (unchanged) On-prem DNS Untouched by this design

A common rollout error is forwarding the wrong name; here is the exact right/wrong list:

You forward (on-prem) Correct? Why
blob.core.windows.net → inbound IP Yes Azure follows the CNAME into the linked privatelink zone
privatelink.blob.core.windows.net → inbound IP No The privatelink name is internal plumbing; on-prem never asks for it
*.azure.com → inbound IP No Far too broad; hijacks unrelated resolution
vaultcore.azure.net → inbound IP Yes Key Vault’s public suffix is vaultcore, not vault
core.windows.net → inbound IP Risky Catches every storage service; prefer per-service suffixes

Regional zones and the long zone-name list

The most common rollout bug is using the wrong zone name. Several services use regional or non-obvious zone names, and a few use a different suffix entirely. Get the name wrong and the zone group silently writes records nowhere useful. Reference values you will use constantly:

Service Subresource (group-id) Private DNS zone name
Blob storage blob privatelink.blob.core.windows.net
File storage file privatelink.file.core.windows.net
Queue storage queue privatelink.queue.core.windows.net
Table storage table privatelink.table.core.windows.net
Data Lake Gen2 (HNS) dfs privatelink.dfs.core.windows.net
Key Vault vault privatelink.vaultcore.azure.net
Azure SQL DB sqlServer privatelink.database.windows.net
SQL Managed Instance managedInstance privatelink.{dnszone}.database.windows.net
Cosmos DB (SQL/Core) Sql privatelink.documents.azure.com
Cosmos DB (MongoDB) MongoDB privatelink.mongo.cosmos.azure.com
PostgreSQL Flexible postgresqlServer privatelink.postgres.database.azure.com
App Service / Functions sites privatelink.azurewebsites.net
Container Registry registry privatelink.azurecr.io (+ regional data zone)
Event Hubs / Service Bus namespace privatelink.servicebus.windows.net
AKS API server management privatelink.<region>.azmk8s.io
Azure Monitor (AMPLS) azuremonitor privatelink.monitor.azure.com (+ companion set)
Azure Cache for Redis redisCache privatelink.redis.cache.windows.net
Azure AI Search searchService privatelink.search.windows.net
Azure OpenAI / AI Services account privatelink.openai.azure.com / cognitiveservices.azure.com
Azure App Configuration configurationStores privatelink.azconfig.io
Azure Web PubSub / SignalR webpubsub / signalr privatelink.webpubsub.azure.com / service.signalr.net

The four traps in that list deserve their own table, because each has burned a real rollout:

Trap What people assume The reality Consequence if wrong
Key Vault suffix privatelink.vault.azure.net It is vaultcore.azure.net Zone never matches; always public
AKS regional zone One global zone privatelink.<region>.azmk8s.io (per region) Wrong region → no API-server resolution
Azure Monitor (AMPLS) One monitor zone A set: monitor, oms, ods, agentsvc, plus blob Partial telemetry; agents fail silently
Container Registry One azurecr.io zone Main zone plus a <region>.data.azurecr.io zone for image pulls Logins work, pulls fail

When you are unsure, the authoritative list is Microsoft’s “Azure Private Endpoint DNS configuration” doc — treat it as the source of truth and do not guess. Sovereign and Government clouds use entirely different suffixes (*.core.usgovcloudapi.net, *.vaultcore.usgovcloudapi.net, etc.). If you run in those clouds, derive names from that cloud’s documentation. The commercial-vs-sovereign suffix shift is total:

Service Commercial (public) US Government cloud
Blob privatelink.blob.core.windows.net privatelink.blob.core.usgovcloudapi.net
Key Vault privatelink.vaultcore.azure.net privatelink.vaultcore.usgovcloudapi.net
Azure SQL privatelink.database.windows.net privatelink.database.usgovcloudapi.net
App Service privatelink.azurewebsites.net privatelink.azurewebsites.us

Architecture at a glance

Follow a single request left to right and the whole design falls into place. A spoke VM (top-left) runs an application that connects to mystorageacct.blob.core.windows.net — the public FQDN, because that is what its SDK and connection string contain. It asks its default resolver, which in any VNet is Azure’s wire server at 168.63.129.16 in the resolution layer. Public Azure DNS returns the CNAME to mystorageacct.privatelink.blob.core.windows.net, and because this spoke’s VNet is linked to the central privatelink.blob.core.windows.net zone in the connectivity subscription, the resolver follows that CNAME straight into the central zone and reads the A record10.x.x.4, the private IP of the private endpoint NIC sitting in the spoke’s snet-pe. The app then opens TCP 443 to that private IP, and the storage account (with public access disabled) accepts the connection because it arrives over Private Link. No byte of that traffic ever touched the public internet, and the only thing that made it private was a DNS answer.

The on-prem host (bottom-left) takes a longer path to the same answer: it cannot see 168.63.129.16, so its on-prem DNS conditionally forwards the public suffix to the resolver’s inbound endpoint, the query is resolved inside Azure where the linked zones are visible, and the private IP comes back over ExpressRoute or VPN. The numbered badges mark exactly where this breaks in production. Badge 1 is a spoke whose VNet was never linked — it falls through to the public IP. Badge 2 is the estate-killer: a DNS-proxy NVA in the hub that all spokes forward through, whose own VNet lost its zone links on a rebuild, so every spoke goes public at once. Badge 3 is an endpoint created without a zone group, so no A record is ever written. Badge 4 is on-prem missing its conditional forwarder, resolving public for everything. Badge 5 is split-brain — a leftover spoke-local zone or an orphaned manual record returning a stale, recycled IP. The legend narrates each as symptom · confirm · fix; read it as the field guide for the rest of this article.

Hub-and-spoke private DNS resolution architecture: a spoke VM and an on-prem host on the left resolve a public storage FQDN through Azure DNS at 168.63.129.16 and the Azure DNS Private Resolver in the hub resolution layer; the resolver follows the privatelink CNAME into a single centrally-hosted set of privatelink.* zones in the connectivity subscription, which are linked to every spoke VNet and governed by a DeployIfNotExists policy; the central zone returns the A record for a private endpoint NIC in the spoke subnet, which connects on port 443 to the PaaS targets — storage, Key Vault and Azure SQL — with five numbered badges marking the unlinked-VNet, unlinked-DNS-proxy, missing-zone-group, missing-on-prem-forwarder and split-brain failure points

Real-world scenario

Northwind Mutual, a regulated insurer, ran a Palo Alto NVA in the hub as DNS proxy for all forty spokes. Every spoke’s VNet DNS pointed at the firewall’s internal IP; the firewall, in turn, forwarded to Azure’s default resolver. AKS private clusters resolved fine, storage worked, Key Vault worked — for eight months. Then, during a routine firewall version upgrade, the network team rebuilt the firewall’s VNet from a clean Bicep template to pick up a new subnet layout. Within minutes, every workload in every spoke started failing: storage SDKs threw connection errors, the AKS API server became unreachable from pods, and the on-call channel lit up with “is storage down?” across six unrelated product teams at once.

It was not storage. It was DNS, and the blast radius was total because of the proxy topology. The firewall’s own VNet had been linked to the privatelink zones manually, by an az network private-dns link vnet create someone ran during the original migration — a command that lived in nobody’s IaC. The clean rebuild recreated the firewall VNet with no zone links. So the chain collapsed exactly here: spokes forwarded DNS to the firewall (fine), the firewall forwarded to 168.63.129.16 in the firewall’s own VNet (fine), but that VNet now had zero privatelink zones linked — so the resolver had nothing to follow the CNAME into, and returned the public A record for everything. Forty spokes, hundreds of endpoints, public IPs everywhere, simultaneously. Because the storage and SQL firewalls denied the public source IPs, every connection failed closed. The incident ran ninety minutes before someone ran nslookup mystorageacct.blob.core.windows.net from a spoke VM, saw a public address, and realised it was resolution, not the services.

The fix was twofold. First, move the firewall VNet’s zone links into the same for_each that links the spokes, so the DNS-proxy VNet is never special-cased:

locals {
  dns_resolving_vnets = merge(var.spoke_vnets, {
    "hub-firewall" = var.firewall_vnet_id
  })
}

That single merge feeds the existing setproduct link resource, guaranteeing the proxy VNet gets every zone the spokes get — forever, automatically, on every apply. Second, they added an audit-style Azure Policy on Microsoft.Network/privateDnsZones/virtualNetworkLinks checked against an allowlist, so any link created or deleted outside the pipeline raises a non-compliant flag within minutes, and an alert fires on the count. The deeper lesson Northwind took away: when spokes resolve through a DNS-proxy NVA, that NVA’s VNet is the single point of failure for all private resolution — it must carry the full zone-link set, that set belongs in code, and the one resource you can least afford to manage by hand is the one most likely to be created with a quick az command during a migration nobody documents.

Advantages and disadvantages

The centralized hub-and-spoke private-DNS design is the right default at scale, but it concentrates risk that you must consciously manage.

Advantages Disadvantages
One copy of each zone — no per-spoke drift The central zone set is a shared dependency for the whole estate
New spoke onboards by adding links (one IaC iteration) Mis-link or unlink the DNS-proxy VNet and everything breaks at once
Policy auto-binds every endpoint — no human step DINE remediation needs RBAC and a backfill task; existing PEs aren’t auto-fixed
Connectivity team owns DNS; spokes just create PEs Cross-subscription RBAC adds a setup step
On-prem resolves into Azure via one managed resolver Resolver needs two delegated /28s planned in hub IP space up front
Failures are deterministic and fast to confirm (nslookup) Resolution failures succeed with the wrong answer — silent until something denies the public IP
Scales to ~1,000 VNet links per zone Beyond that you split the estate or add a second zone copy
Works identically for storage, KV, SQL, AKS, AMPLS Each service’s zone name must be exactly right (regional/special suffixes)

When the central model is decisively right: any landing zone with more than a handful of spokes, any regulated workload, any hybrid estate. When you might deviate: a single, isolated VNet with two endpoints and no on-prem clients can host its own zone locally without ceremony — though even then, doing it the central way costs nothing extra and future-proofs the growth. The one thing you never do at scale is the per-spoke-zone anti-pattern; it feels simpler on day one and becomes an unmanageable drift surface by spoke ten.

Hands-on lab

A self-contained walk-through: create a storage account with public access disabled, a spoke VNet, a private endpoint, the central zone, the link, and a zone group — then prove resolution returns a private IP. Run it in a sandbox subscription; the storage account and a small VNet cost pennies for an hour, and teardown removes everything.

1. Variables and resource group.

LOC=eastus2
RG=rg-pe-lab
az group create -n $RG -l $LOC
ACCT="stpelab$RANDOM"

2. Create a VNet with a dedicated private-endpoint subnet.

az network vnet create -g $RG -n vnet-lab --address-prefixes 10.20.0.0/16 \
  --subnet-name snet-pe --subnet-prefixes 10.20.1.0/24
# Disable PE network policies so the endpoint NIC can be placed
az network vnet subnet update -g $RG --vnet-name vnet-lab -n snet-pe \
  --disable-private-endpoint-network-policies true

3. Create a storage account and disable public access.

az storage account create -g $RG -n $ACCT -l $LOC --sku Standard_LRS --kind StorageV2
az storage account update -g $RG -n $ACCT --public-network-access Disabled
STORAGE_ID=$(az storage account show -g $RG -n $ACCT --query id -o tsv)

4. Create the private endpoint for the blob subresource.

az network private-endpoint create -g $RG -n pe-blob \
  --vnet-name vnet-lab --subnet snet-pe \
  --private-connection-resource-id "$STORAGE_ID" \
  --group-id blob --connection-name conn-blob

5. Create the central Private DNS zone and link the VNet. (In production this zone is in the connectivity subscription; here it’s in the same RG for simplicity.)

az network private-dns zone create -g $RG -n privatelink.blob.core.windows.net
az network private-dns link vnet create -g $RG \
  --zone-name privatelink.blob.core.windows.net \
  --name link-lab --virtual-network vnet-lab --registration-enabled false
ZONE_ID=$(az network private-dns zone show -g $RG \
  -n privatelink.blob.core.windows.net --query id -o tsv)

6. Bind the endpoint to the zone with a zone group (lets Azure write and own the A record).

az network private-endpoint dns-zone-group create -g $RG \
  --endpoint-name pe-blob --name default \
  --private-dns-zone "$ZONE_ID" --zone-name privatelink-blob

7. Verify the A record was written automatically.

az network private-dns record-set a list -g $RG \
  --zone-name privatelink.blob.core.windows.net -o table
# Expect an A record for the account name pointing at 10.20.1.x

8. Prove resolution from inside the VNet. Create a tiny VM in the spoke (or use an existing one) and resolve the public FQDN — it must return the private IP:

az vm create -g $RG -n vm-test --image Ubuntu2204 --vnet-name vnet-lab \
  --subnet snet-pe --admin-username azureuser --generate-ssh-keys --size Standard_B1s
az vm run-command invoke -g $RG -n vm-test --command-id RunShellScript \
  --scripts "nslookup ${ACCT}.blob.core.windows.net"
# Expect: canonical name = ...privatelink.blob.core.windows.net ; Address: 10.20.1.x

A public IP here means the zone isn’t linked or the zone group is missing — re-check steps 5 and 6.

9. Teardown.

az group delete -n $RG --yes --no-wait

The lab maps one-to-one onto the production pattern; the only differences at scale are where the zone lives (connectivity subscription), how many links exist (one per spoke via for_each), and who creates the zone group (the DINE policy, not you).

Common mistakes & troubleshooting

Resolution failures are binary and fast to diagnose once you know the playbook. This is the table to keep open during an incident: the symptom you observe, the root cause, the exact command to confirm it, and the fix. Read the prose under it for the non-obvious ones.

# Symptom Root cause Confirm (exact command / path) Fix
1 Spoke nslookup returns a public IP This VNet has no link to the central zone az network private-dns link vnet list -g $HUB_RG --zone-name <zone> (spoke absent) Add a resolution-only link (--registration-enabled false)
2 FQDN returns public; zone is linked Endpoint has no zone group, so no A record az network private-endpoint dns-zone-group list -g <rg> --endpoint-name <pe> (empty) Create the zone group, or let DINE + remediation backfill
3 Record exists but points at a wrong/old IP Manual A record orphaned after redeploy az network private-dns record-set a list -g $HUB_RG --zone-name <zone> vs PE IP Delete the manual record; bind a zone group instead
4 All spokes resolve public at once DNS-proxy NVA’s VNet lost its zone links az network private-dns link vnet list for the firewall VNet (none) Re-link the proxy VNet; put it in the spokes’ for_each
5 On-prem returns public; spoke returns private On-prem conditional forwarder missing/wrong nslookup <fqdn> from on-prem; check forwarder targets Forward the public suffix to the resolver inbound IP
6 On-prem forwarder set, still public Forwarder points at privatelink.* not the public suffix Inspect on-prem forwarder zone names Forward blob.core.windows.net, not privatelink.blob…
7 Resolution random (private or public) Split-brain: spoke-local zone + central link both present List zones in the spoke RG/sub for a duplicate Delete the spoke-local zone; keep only the central one
8 Record written but never used Wrong zone name for the service (regional/special) Compare zone name to the reference table Recreate in the correct zone (vaultcore, <region>.azmk8s.io…)
9 Key Vault resolves public despite a zone Used vault.azure.net instead of vaultcore.azure.net az network private-dns zone list for the exact name Create privatelink.vaultcore.azure.net; rebind
10 AMPLS telemetry partially missing Only monitor zone created, not the full set Check for oms/ods/agentsvc/blob companion zones Create all five AMPLS zones and link them
11 New endpoint not auto-bound DINE identity lacks RBAC on the zone Policy compliance shows the deploy failed Grant Private DNS Zone Contributor on the zone
12 Old endpoints non-compliant, unfixed DINE only acts on new resources Policy assignment shows non-compliant existing PEs Run a remediation task to backfill
13 Private IP resolves but connection times out Not DNS — NSG/UDR/peering/firewall blocks 443 nc -vz <privateIP> 443 from the spoke Fix routing/NSG (see the VNet troubleshooting article)
14 Storage 403 after going private Resolution returned public; PaaS firewall denied it nslookup shows public IP → it’s resolution Fix the DNS link/zone group, not the storage ACL

The non-obvious failures, expanded

The estate-wide failure (row 4) is the one to fear. When spokes use a hub NVA as DNS proxy, that NVA’s VNet must itself be linked to every privatelink zone, because the resolver only consults zones linked to the VNet the query is resolved in. The proxy resolves in its own VNet, so the proxy’s VNet — not the spokes’ — needs the links. Confirm by listing links for the firewall VNet, not the spoke. The fix is to treat the proxy VNet as just another resolving VNet in your IaC, never as a special case (see the real-world scenario).

On-prem forwards the public name, never privatelink (rows 5–6). The whole point of the inbound endpoint is to resolve inside Azure, where 168.63.129.16 will follow the CNAME into the linked privatelink zone. If you forward privatelink.blob.core.windows.net from on-prem, you’ve forwarded the internal plumbing name that on-prem should never reference — and resolution fails. Forward the public suffix (blob.core.windows.net) to the inbound endpoint IP, full stop.

Split-brain is non-deterministic and maddening (row 7). If a spoke still hosts its own privatelink.* zone and is linked to the central one, lookups are answered by whichever the resolver consults first — so the same FQDN returns private sometimes and public other times, often differing between VMs. Pick the central zone, delete the local copies, and add the audit policy from the scenario so a stray local zone is flagged fast.

Orphaned records resolve to recycled IPs (row 3). When a zone group is bypassed or a resource is force-deleted, hand-written A records linger and may resolve to an IP that’s since been reassigned to a different endpoint — a silent cross-wiring. Periodically diff record-sets against live endpoints, and link lists against live VNets; orphans are a real outage source at scale.

The decision table for “is this even a DNS problem?” — run this first, before you touch any zone:

If you see… It’s probably… Do this
nslookup returns a public IP A DNS/link/zone-group problem Work the resolution playbook above
nslookup returns a private IP but connection fails A network problem (NSG/UDR/peering) nc -vz <ip> 443; fix routing, not DNS
Private from spoke, public from on-prem A conditional-forwarder gap Fix the on-prem forwarder → inbound IP
Private sometimes, public other times Split-brain (duplicate zones) Delete the spoke-local zone
Everything public, everywhere, suddenly The proxy VNet lost its links Re-link the NVA/firewall VNet
Public for one service only Wrong zone name for that service Check the regional/special-suffix table

Verify

Resolution is binary and easy to test. From a VM in a spoke, the FQDN must resolve to a private (RFC 1918) address:

nslookup mystorageacct.blob.core.windows.net
# Expect:
#   ...canonical name = mystorageacct.privatelink.blob.core.windows.net
#   Address: 10.x.x.x        <- private. Public IP here = broken.

Confirm the central zone actually holds the record, and audit which VNets are linked:

# Record exists and points at the endpoint's private IP?
az network private-dns record-set a list \
  --resource-group $HUB_RG \
  --zone-name privatelink.blob.core.windows.net -o table

# Which VNets can resolve this zone?
az network private-dns link vnet list \
  --resource-group $HUB_RG \
  --zone-name privatelink.blob.core.windows.net \
  --query "[].{name:name, vnet:virtualNetwork.id, reg:registrationEnabled}" -o table

# Endpoint approved and connected?
az network private-endpoint show \
  --name pe-stblob-app1 --resource-group rg-app1 \
  --query "privateLinkServiceConnections[0].privateLinkServiceConnectionState" -o json

From on-prem, the same nslookup against the public FQDN must also return the private IP — proving the conditional forwarder reaches the inbound endpoint. If on-prem returns the public IP but the spoke returns private, the forwarder or the route to the inbound endpoint is the problem, not the zone. The four-quadrant truth table tells you instantly which half of the design is broken:

Spoke result On-prem result Verdict Where to look
Private Private Healthy end to end Nothing — you’re done
Private Public In-Azure good; on-prem forwarder broken On-prem conditional forwarder → inbound IP
Public Public Central zone/link broken for everyone Zone exists? VNet linked? Proxy VNet linked?
Public Private Rare; spoke link missing but on-prem path resolves Add the spoke VNet link

Best practices

Production-grade rules distilled from running this at landing-zone scale:

# Practice Why it matters
1 Host one copy of each privatelink.* zone in the connectivity subscription Eliminates per-spoke drift; one source of truth
2 Link every resolving VNet with registration-enabled false Resolution-only; avoids the one-registration-link limit
3 Bind every endpoint with a zone group, never a manual A record Azure owns the record lifecycle; no orphans
4 Enforce binding with a DINE policy at the landing-zone management group Removes the human “remember the zone group” step
5 Run a remediation task after assigning the policy DINE doesn’t fix pre-existing endpoints automatically
6 Put the DNS-proxy/firewall VNet in the same link for_each as spokes Prevents the estate-wide failure on a rebuild
7 Manage all zone links in IaC; audit-policy any link created out-of-band The one-off az link is the classic SPOF
8 Deploy the DNS Private Resolver (not VMs) with two delegated /28s Managed, HA, no patching; plan IP space up front
9 Forward the public suffix from on-prem to the inbound IP — never privatelink.* Lets Azure follow the CNAME internally
10 Verify regional/special zone names (KV vaultcore, AKS region, AMPLS set) against the docs Wrong name writes records nowhere useful
11 Schedule an orphan/link audit (diff records vs endpoints, links vs VNets) Surfaces drift before a user files a ticket
12 Test resolution from both a spoke VM and an on-prem host after every change The four-quadrant table catches half-broken states

Security notes

Private endpoints exist for security; the DNS layer is where that security quietly succeeds or fails.

Control What to do Why
Disable public network access on the PaaS resource --public-network-access Disabled on storage/KV/SQL Without this, the public endpoint stays reachable even with a PE
Least-privilege on the zone Grant DINE identity only Private DNS Zone Contributor on the zone Avoid broad Network Contributor at subscription scope
RBAC the connectivity subscription tightly Only the platform team writes zones/links DNS is now a shared, estate-wide control plane
Audit zone links as code Deny/audit links created outside the pipeline A rogue or deleted link silently breaks/leaks resolution
Approve PE connections deliberately Use manual approval for cross-tenant/3rd-party PEs Auto-approval can expose a resource you didn’t intend
Keep the resolver inbound IP private Reachable only over ExpressRoute/VPN No public exposure of the DNS bridge
Don’t forward privatelink.* from on-prem Forward only public suffixes Prevents leaking internal naming and broken resolution
Monitor for public-IP regressions Alert if a known PE FQDN ever resolves public Catches a dropped link before data takes the public path

The subtle security failure mode: a resolution bug doesn’t open a port — it sends your “private” PaaS traffic out the public path, where the PaaS firewall denies it (fail-closed, the good case) or, if public access was never disabled, silently allows it (fail-open, the data-exfiltration case). Disabling public network access turns every DNS regression into a loud failure instead of a silent leak.

Cost & sizing

The DNS layer itself is cheap; the cost conversation is mostly about the private endpoints and the resolver. Rough figures (verify current pricing for your region):

Item Unit Rough cost (USD) Rough cost (INR) Notes
Private DNS zone Per zone / month ~$0.50 ~₹42 ~40 service zones max — negligible
Private DNS queries Per million queries ~$0.40 ~₹33 Most estates are well within noise
VNet link Per link Free Free Link freely — no per-link charge
Private endpoint Per endpoint / hour ~$0.01/hr (~$7.30/mo) ~₹600/mo The real cost driver at hundreds of PEs
PE data processing Per GB ~$0.01/GB ~₹0.83/GB Inbound + outbound through the PE
DNS Private Resolver endpoint Per endpoint / hour ~$0.10/hr each ~₹8/hr each Two endpoints (in + out) in the hub
DNS Private Resolver queries Per million ~$0.40 ~₹33 Only on-prem-bound/forwarded queries

Sizing guidance, by estate scale:

Estate size Private endpoints Zones VNet links Resolver needed? Dominant cost
Single workload 2–10 2–4 1–2 No (in-Azure only) The endpoints
Small landing zone 20–80 5–10 5–15 If hybrid The endpoints
Large landing zone 200–800 10–20 40–200 Yes (hybrid) The endpoints + resolver
Multi-region 800+ per-region copies up to 1,000/zone Yes, per region Endpoints + regional resolvers

The cost levers worth knowing: VNet links are free, so never economize on linking; the private endpoints dominate the bill, so consolidate where a single endpoint with multiple subresources suffices; and the resolver’s two endpoints are a fixed ~$1.40/day in the hub regardless of estate size — a rounding error against hundreds of endpoints. There is no free tier for private endpoints, but the lab in this article runs for well under a dollar in an hour and tears down cleanly.

Interview & exam questions

Mapped to AZ-700 (Designing and Implementing Azure Networking), AZ-305 (Designing Azure Infrastructure Solutions) and AZ-104.

1. Why does a private endpoint require Private DNS, when it already has a private IP? Because the application connects by the public FQDN (baked into SDKs, connection strings and certificates), not the IP. Without a Private DNS zone holding the private A record, that FQDN resolves to the public IP and the endpoint is bypassed. DNS is the only thing that redirects the name to the private IP.

2. What is the full resolution chain for mystorageacct.blob.core.windows.net with a private endpoint? Public Azure DNS returns a CNAME to mystorageacct.privatelink.blob.core.windows.net; that name is resolved by your hosted privatelink.blob.core.windows.net zone, which holds an A record to the endpoint’s private IP. If the client can’t see that zone, it falls through to the public A record.

3. Why use a zone group instead of creating the A record manually? A zone group makes Azure manage the record’s entire lifecycle — write on creation, update on IP change, delete on endpoint deletion. Manual records become stale or orphaned the moment a resource is redeployed, and they’re not policy-enforceable.

4. How do you avoid forty copies of the same privatelink zone across forty spokes? Host one copy in the connectivity subscription and create a VNet link per spoke (registration-enabled false). A VNet’s default resolver consults every linked zone, so one zone serves all spokes with no duplication.

5. What does registration-enabled false mean and why is it required here? It makes the link resolution-only — the VNet can read the zone’s records but doesn’t auto-register VM hostnames into it. Only one link per zone may have registration enabled, and auto-registration has no place in a shared privatelink zone.

6. How do you enforce that every new endpoint gets the correct zone group automatically? Assign a DeployIfNotExists (DINE) Azure Policy at the landing-zone management group, parameterized with the central zone’s resource ID. It auto-creates the zone group for new endpoints. Pre-existing endpoints need a remediation task.

7. Two RBAC roles the DINE managed identity needs, and why? Network Contributor (to create the privateDnsZoneGroups child object on the endpoint) and Private DNS Zone Contributor on the central zone (to write the A record). Missing either makes the policy deploy fail.

8. How do on-prem clients resolve a private endpoint, given 168.63.129.16 is unreachable from on-prem? Deploy the Azure DNS Private Resolver with an inbound endpoint, and configure on-prem conditional forwarders to send the public suffix (e.g. blob.core.windows.net) to that inbound IP. Resolution then happens inside Azure where the linked zones are visible.

9. Which name do you forward from on-prem — the public suffix or the privatelink name? The public suffix. Azure follows the CNAME into the linked privatelink zone itself; on-prem should never reference the privatelink.* name directly.

10. A storage account behind a private endpoint suddenly returns 403 to an app. First check? nslookup the FQDN from the app’s host. A public IP means resolution broke (missing link or zone group) and the storage firewall denied the public source — fix the DNS, not the storage ACL. A private IP that times out is a network (NSG/UDR) problem instead.

11. Why is a DNS-proxy NVA in the hub a single point of failure for private resolution? Because the resolver consults only the zones linked to the VNet where the query is resolved — and with a proxy, that’s the NVA’s VNet, not the spokes’. If the NVA’s VNet loses its zone links, every spoke that forwards through it resolves public at once.

12. Name two services with non-obvious Private DNS zone names. Key Vault uses privatelink.vaultcore.azure.net (not vault.azure.net); AKS private clusters use a regional privatelink.<region>.azmk8s.io; Azure Monitor Private Link Scope needs a set of zones (monitor, oms, ods, agentsvc, blob).

Quick check

  1. With no Private DNS zone in place, what does mystorageacct.blob.core.windows.net resolve to from a spoke VM, and what happens to the traffic?
  2. You created a private endpoint but the FQDN still resolves to a public IP, even though the zone is linked. What single object is most likely missing?
  3. Why must on-prem conditional forwarders target the public suffix (e.g. vaultcore.azure.net) and not privatelink.vaultcore.azure.net?
  4. Your estate uses a hub firewall as DNS proxy. After a firewall VNet rebuild, every spoke resolves public. What broke?
  5. A spoke VM resolves the private IP, but the app still can’t connect on 443. Is this a DNS problem? How do you confirm?

Answers

  1. The public IP. The traffic leaves over the internet path (or is denied by the PaaS firewall) and the private endpoint is bypassed — DNS is the only thing that would have redirected the name to the private IP.
  2. The zone group on the private endpoint. Without it, no A record is written into the zone, so even a linked zone has nothing to return — resolution falls through to the public record.
  3. Because Azure’s resolver follows the public name’s CNAME into the linked privatelink zone itself; on-prem should resolve the public name inside Azure and never reference the internal privatelink.* plumbing. Forwarding privatelink.* breaks the chain.
  4. The firewall (proxy) VNet lost its zone links. The resolver consults zones linked to the VNet where it resolves — the proxy’s VNet — and a clean rebuild dropped manually-created links, so the resolver had no privatelink zones to follow the CNAME into.
  5. No — a private IP in the answer means DNS is correct. Confirm with nc -vz <privateIP> 443 from the spoke; a failure points at NSG/UDR/peering/firewall, not the zone.

Glossary

Term Definition
Private endpoint A NIC with a private IP that projects a PaaS resource into your VNet via Private Link.
Subresource / group-id The specific sub-service a private endpoint targets (e.g. blob, vault, sqlServer).
Private Link The Azure backbone path that carries traffic to a private endpoint without traversing the internet.
privatelink.* zone The Private DNS zone you host that contains the private A records for endpoints of a service.
Private DNS zone A zone hosted in Azure (not internet-published) resolved by VNet default DNS when linked.
A record The DNS record mapping the privatelink FQDN to the endpoint’s private IP.
Zone group A child object of a private endpoint that makes Azure manage the A record’s lifecycle.
VNet link The binding that makes a Private DNS zone resolvable from a given virtual network.
Registration-enabled A link flag; true auto-registers VM hostnames (one per zone), false is resolution-only.
168.63.129.16 Azure’s per-VNet wire-server resolver that auto-consults all linked Private DNS zones.
DeployIfNotExists (DINE) An Azure Policy effect that auto-deploys a missing resource (here, the zone group).
Remediation task A Policy job that brings pre-existing non-compliant resources into compliance.
DNS Private Resolver A managed Azure service that forwards DNS between on-prem and Azure without DNS VMs.
Inbound endpoint A private IP on the resolver that on-prem DNS conditionally forwards queries to.
Outbound endpoint The resolver’s egress point for queries Azure forwards out to on-prem DNS.
Forwarding ruleset A set of domain→target-DNS rules applied (via VNet links) to govern Azure→on-prem resolution.
Conditional forwarder An on-prem DNS rule sending a specific suffix to a chosen DNS server (here, the inbound IP).
Split-brain DNS Non-deterministic resolution caused by a name existing in two zones a client can both see.
DNS-proxy NVA A firewall/appliance resolving DNS on the spokes’ behalf; its VNet must carry all zone links.

Next steps

AzurePrivate EndpointPrivate DNSNetworkingDNS ResolverHub-Spoke
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments