Azure Architecture

Private Endpoints and Private DNS at Scale: A Hub-and-Spoke Resolution Architecture

One private endpoint is easy. Three hundred of them across forty spokes, with on-prem clients that also need to resolve them, is an architecture problem. Get the DNS design wrong early and you inherit zone sprawl, split-brain resolution, and a steady drip of “the app can’t reach the storage account” tickets. This is how to centralize it correctly the first time.

1. Why private endpoints break name resolution

A private endpoint projects a NIC for a PaaS resource into your VNet and gives it a private IP. The problem is the name. Your application still connects to the public FQDN — mystorageacct.blob.core.windows.net, myvault.vault.azure.net — because that name is baked into SDKs, connection strings, and certificates.

Resolve that public FQDN with no private DNS in place and you get the public IP. Traffic leaves over the internet path (or is blocked by the firewall on the PaaS resource) and the private endpoint is never used. The fix is the chain Azure builds for you:

mystorageacct.blob.core.windows.net
  -> CNAME mystorageacct.privatelink.blob.core.windows.net
       -> A 10.x.x.x   (only resolvable if you host the privatelink zone)

Microsoft’s public DNS already returns the privatelink.* CNAME. Your job is to host the privatelink.blob.core.windows.net Private DNS zone with an A record pointing the resource’s private endpoint at its private IP. If the client can see that zone, it follows the CNAME to your private A record. If it cannot, it falls through to the public A record. Every failure mode in this article is a variation of “the client could not see the right zone.”

2. Zone groups beat manual A records

You can create the A record by hand. Do not. A Private DNS zone group binds a private endpoint to one or more zones so Azure manages the A record lifecycle for you: it writes the record on creation, updates it if the IP changes, and deletes it when the endpoint is deleted. Manual records rot the moment someone redeploys.

# Create the private endpoint (storage blob example)
az network private-endpoint create \
  --name pe-stblob-app1 \
  --resource-group rg-app1 \
  --vnet-name vnet-spoke-app1 --subnet snet-pe \
  --private-connection-resource-id "$STORAGE_ID" \
  --group-id blob \
  --connection-name conn-stblob-app1

# Bind it to the centralized zone via a zone group
az network private-endpoint dns-zone-group create \
  --resource-group rg-app1 \
  --endpoint-name pe-stblob-app1 \
  --name default \
  --private-dns-zone "$ZONE_ID_BLOB" \
  --zone-name privatelink-blob

The --private-dns-zone here is a full resource ID. That ID can point at a zone in a different subscription — which is exactly how we centralize. The spoke owns the endpoint; the connectivity subscription owns the zone.

The --group-id (sometimes subresource) is per service: blob, file, table, queue, dfs for storage; vault for Key Vault; sqlServer for Azure SQL; mariadbServer, postgresqlServer, and so on. One resource can need several — a storage account using blob and file needs two endpoints (or one endpoint with two group IDs) and two zone groups.

3. Centralize zones in the connectivity subscription

The anti-pattern is one set of privatelink.* zones per spoke. With forty spokes you would have forty copies of privatelink.blob.core.windows.net, each a separate place for records to drift. Instead, host one copy of each zone in the connectivity (hub) subscription and project it into every spoke with a VNet link.

HUB_RG="rg-connectivity-dns"

# One zone, hosted centrally
az network private-dns zone create \
  --resource-group $HUB_RG \
  --name privatelink.blob.core.windows.net

# Link every VNet that needs to resolve it.
# registration-enabled=false: this is a resolution-only link.
az network private-dns link vnet create \
  --resource-group $HUB_RG \
  --zone-name privatelink.blob.core.windows.net \
  --name link-spoke-app1 \
  --virtual-network "$SPOKE_APP1_VNET_ID" \
  --registration-enabled false

A VNet’s default resolver (168.63.129.16) automatically consults every Private DNS zone linked to that VNet. So a VM in vnet-spoke-app1 querying the blob FQDN walks: public CNAME to privatelink.*, then the linked central zone returns the private A record. No forwarders, no resolver, no custom DNS on the spoke — for VNet-internal clients. Hybrid clients are section 5.

Set registration-enabled false on these links. Auto-registration is for VM hostnames in a single VNet; it has no place in a shared privatelink zone and only one link per zone may have it enabled anyway.

In Terraform the central-plus-many-links pattern is just a for_each:

locals {
  privatelink_zones = [
    "privatelink.blob.core.windows.net",
    "privatelink.file.core.windows.net",
    "privatelink.vaultcore.azure.net",
    "privatelink.database.windows.net",
  ]
}

resource "azurerm_private_dns_zone" "zones" {
  for_each            = toset(local.privatelink_zones)
  name                = each.value
  resource_group_name = azurerm_resource_group.dns.name
}

# Cartesian product: every zone linked to every spoke VNet
resource "azurerm_private_dns_zone_virtual_network_link" "links" {
  for_each = {
    for pair in setproduct(local.privatelink_zones, keys(var.spoke_vnets)) :
    "${pair[0]}|${pair[1]}" => { zone = pair[0], vnet = pair[1] }
  }
  name                  = "link-${each.value.vnet}"
  resource_group_name   = azurerm_resource_group.dns.name
  private_dns_zone_name = azurerm_private_dns_zone.zones[each.value.zone].name
  virtual_network_id    = var.spoke_vnets[each.value.vnet]
  registration_enabled  = false
}

4. Policy-enforced private DNS

Manual zone groups do not scale across teams. The moment a spoke owner creates an endpoint and forgets the zone group, you have a silent public-resolution bug. Azure Policy with a deployIfNotExists (DINE) effect closes the gap: it watches for new private endpoints and auto-creates the zone group pointing at your central zone.

Microsoft ships built-in DINE policies — search for “Configure private endpoints … to use private DNS zones” in the policy catalog (Microsoft.Authorization/policyDefinitions). There is a service-specific one (e.g. for Blob, Key Vault, SQL) and you typically assign them as an initiative at the landing-zone management group, each parameterized with the central zone’s resource ID.

The shape of the rule, so you know what it is doing:

{
  "if": {
    "allOf": [
      { "field": "type", "equals": "Microsoft.Network/privateEndpoints" },
      {
        "count": {
          "field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*].groupIds[*]",
          "where": { "field": "...groupIds[*]", "equals": "blob" }
        },
        "greaterOrEquals": 1
      }
    ]
  },
  "then": {
    "effect": "deployIfNotExists",
    "details": {
      "type": "Microsoft.Network/privateEndpoints/privateDnsZoneGroups",
      "roleDefinitionIds": [
        "/providers/Microsoft.Authorization/roleDefinitions/4d97b98b-1d4f-4787-a291-c67834d212e7"
      ],
      "deployment": { "properties": { "..." : "ARM template that creates the zone group" } }
    }
  }
}

The role definition ID above is Network Contributor — the DINE managed identity needs it (plus Private DNS Zone Contributor on the zone) to write the zone group and record. Two operational notes:

5. On-prem and hybrid resolution with the Private Resolver

VNet clients are solved. On-prem clients are not: a server in your datacenter querying mystorageacct.blob.core.windows.net hits its own DNS, gets the public CNAME, and has no way to reach 168.63.129.16 — that address is non-routable outside Azure. You need an in-Azure resolver that on-prem can forward to.

The modern answer is the Azure DNS Private Resolver, a managed service (no DNS VMs to patch). Deploy it in the hub with an inbound endpoint (an IP on-prem forwards to) and an outbound endpoint (for queries Azure sends back out to on-prem).

RESOLVER_RG="rg-connectivity-dns"

az dns-resolver create \
  --name dnspr-hub \
  --resource-group $RESOLVER_RG \
  --location eastus2 \
  --id "$HUB_VNET_ID"

# Inbound: gets a private IP in a dedicated /28 subnet delegated to the resolver
az dns-resolver inbound-endpoint create \
  --dns-resolver-name dnspr-hub \
  --resource-group $RESOLVER_RG \
  --name inbound \
  --location eastus2 \
  --ip-configurations "[{private-ip-allocation-method:Dynamic,subnet:{id:$INBOUND_SUBNET_ID}}]"

# Outbound: needs its own delegated /28 subnet
az dns-resolver outbound-endpoint create \
  --dns-resolver-name dnspr-hub \
  --resource-group $RESOLVER_RG \
  --name outbound \
  --location eastus2 \
  --subnet "$OUTBOUND_SUBNET_ID"

Both endpoints require dedicated subnets delegated to Microsoft.Network/dnsResolvers, minimum /28. Plan IP space for this in the hub up front.

Conditional forwarding rulesets (Azure to on-prem)

When an Azure workload needs to resolve an on-prem name (db01.corp.local), the resolver’s outbound endpoint sends it to your on-prem DNS via a forwarding ruleset. Each rule maps a domain to target DNS servers; link the ruleset to the VNets that should obey it.

az dns-resolver forwarding-ruleset create \
  --name frs-onprem \
  --resource-group $RESOLVER_RG \
  --location eastus2 \
  --outbound-endpoints "[{id:$OUTBOUND_ENDPOINT_ID}]"

az dns-resolver forwarding-rule create \
  --ruleset-name frs-onprem \
  --resource-group $RESOLVER_RG \
  --name rule-corp-local \
  --domain-name "corp.local." \
  --forwarding-rule-state Enabled \
  --target-dns-servers "[{ip-address:10.50.0.10,port:53},{ip-address:10.50.0.11,port:53}]"

az dns-resolver vnet-link create \
  --ruleset-name frs-onprem \
  --resource-group $RESOLVER_RG \
  --name link-hub \
  --virtual-network "$HUB_VNET_ID"

The trailing dot on corp.local. is mandatory — these are fully qualified domain names.

6. ExpressRoute / VPN inbound resolution (on-prem to Azure)

This is the reverse direction and the one teams forget. For on-prem clients to resolve private endpoints, point a conditional forwarder on your on-prem DNS at the resolver’s inbound endpoint IP, for the public DNS suffixes of the PaaS services.

The subtlety: you forward the public zone names (blob.core.windows.net, vaultcore.azure.net), not the privatelink.* names. On-prem asks for mystorageacct.blob.core.windows.net; the inbound endpoint resolves it inside Azure, where 168.63.129.16 follows the CNAME into your linked privatelink zone and returns the private IP. On-prem never references privatelink directly.

On Windows Server DNS, one forwarder per suffix:

$inbound = "10.10.0.4"   # resolver inbound endpoint IP

"blob.core.windows.net",
"file.core.windows.net",
"vaultcore.azure.net",
"database.windows.net" | ForEach-Object {
  Add-DnsServerConditionalForwarderZone `
    -Name $_ `
    -MasterServers $inbound `
    -ReplicationScope "Forest"
}

Traffic to the inbound endpoint rides your existing ExpressRoute private peering or VPN — the resolver IP is a normal private address in the hub, reachable over the same routes your workloads already use. No public exposure.

7. Regional zones and the long zone-name list

The most common rollout bug is using the wrong zone name. Several services use regional or non-obvious zone names, and a few use a different suffix entirely. Get the name wrong and the zone group silently writes records nowhere useful. Reference values:

Service Subresource (group-id) Private DNS zone name
Blob storage blob privatelink.blob.core.windows.net
File storage file privatelink.file.core.windows.net
Key Vault vault privatelink.vaultcore.azure.net
Azure SQL DB sqlServer privatelink.database.windows.net
Cosmos DB (SQL) Sql privatelink.documents.azure.com
App Service / Functions sites privatelink.azurewebsites.net
AKS API server management privatelink.<region>.azmk8s.io
Azure Monitor (AMPLS) azuremonitor privatelink.monitor.azure.com (+ several companion zones)

Note the Key Vault suffix is vaultcore.azure.net, not vault.azure.net. Azure Monitor Private Link Scope needs a set of zones (monitor, oms, ods, agentsvc, blob) together, not one. AKS private clusters and some services embed the region in the zone name. When you are unsure, the authoritative list is Microsoft’s “Azure Private Endpoint DNS configuration” doc — treat it as the source of truth and do not guess.

Sovereign and Government clouds use entirely different suffixes (*.core.usgovcloudapi.net, etc.). If you run in those clouds, derive names from that cloud’s documentation.

Enterprise scenario

A regulated insurer ran a Palo Alto NVA in the hub as DNS proxy for all forty spokes. AKS private clusters resolved fine, storage worked, then every workload broke for an hour after a “routine” firewall rebuild. The cause: the NVA’s own VNet had been linked to the privatelink zones manually, outside IaC. The rebuild redeployed the firewall VNet from a clean template — the zone links were never recreated, so the proxy upstream (168.63.129.16) could no longer see any privatelink zone. Spokes pointed DNS at the firewall, the firewall resolved against Azure default DNS, and Azure default DNS in that VNet had no zones linked. Public IPs came back everywhere at once.

The fix was twofold. First, move the firewall VNet’s zone links into the same for_each that links the spokes, so the DNS-proxy VNet is never special-cased:

locals {
  dns_resolving_vnets = merge(var.spoke_vnets, {
    "hub-firewall" = var.firewall_vnet_id
  })
}

That single merge feeds the existing setproduct link resource, guaranteeing the proxy VNet gets every zone the spokes get. Second, we added a deny-style Azure Policy on Microsoft.Network/privateDnsZones/virtualNetworkLinks audited against an allowlist, so any link created or deleted outside the pipeline raises a non-compliant flag within minutes. The deeper lesson: when spokes resolve through a DNS-proxy NVA, that NVA’s VNet is the single point of failure for all private resolution — it must carry the full zone-link set, and that set belongs in code, not in a one-off az command someone ran during a migration.

Verify

Resolution is binary and easy to test. From a VM in a spoke, the FQDN must resolve to a private (RFC 1918) address:

nslookup mystorageacct.blob.core.windows.net
# Expect:
#   ...canonical name = mystorageacct.privatelink.blob.core.windows.net
#   Address: 10.x.x.x        <- private. Public IP here = broken.

Confirm the central zone actually holds the record, and audit which VNets are linked:

# Record exists and points at the endpoint's private IP?
az network private-dns record-set a list \
  --resource-group $HUB_RG \
  --zone-name privatelink.blob.core.windows.net -o table

# Which VNets can resolve this zone?
az network private-dns link vnet list \
  --resource-group $HUB_RG \
  --zone-name privatelink.blob.core.windows.net \
  --query "[].{name:name, vnet:virtualNetwork.id, reg:registrationEnabled}" -o table

# Endpoint approved and connected?
az network private-endpoint show \
  --name pe-stblob-app1 --resource-group rg-app1 \
  --query "privateLinkServiceConnections[0].privateLinkServiceConnectionState" -o json

From on-prem, the same nslookup against the public FQDN must also return the private IP — proving the conditional forwarder reaches the inbound endpoint. If on-prem returns the public IP but the spoke returns private, the forwarder or the route to the inbound endpoint is the problem, not the zone.

Rollout checklist

Pitfalls

Split-brain from leftover spoke zones. If a spoke still hosts its own privatelink.* zone and is linked to the central one, lookups are non-deterministic. Pick the central zone; delete the local copies.

Orphaned zones and stale records. When a zone group is bypassed or a resource is force-deleted, A records linger and resolve to recycled IPs. Periodically diff record-sets against live endpoints and link lists against live VNets — orphans are a real outage source at scale.

Hub DNS settings. If spokes use the hub’s resolver as custom DNS (common with a NVA firewall doing DNS proxy), the resolver inbound endpoint or the firewall must itself sit in a VNet linked to all the privatelink zones, or central resolution breaks for everyone downstream.

One endpoint, many subresources. A storage account using blob and file needs records in both zones. A single zone group can reference multiple zones — make sure each subresource in use has a matching entry, or half your traffic goes public.

Next steps

Wire the zone-creation, links, resolver, and DINE policy into your landing-zone IaC so a new spoke is fully resolvable the moment it is vended — no manual DNS step, ever. Then add a scheduled audit (the two az ... list queries above, diffed against an inventory) so orphaned records and missing links surface before a user files the ticket.

AzurePrivate EndpointPrivate DNSNetworkingDNS ResolverHub-Spoke

Comments

Keep Reading