Private endpoints are a regional construct. An active-active application is not. That mismatch is where most multi-region private connectivity designs quietly break: the architecture diagram shows two symmetric regions, but the private DNS records, the zone links, and the failover behaviour were all reasoned about as if there were one. This article builds the cross-region resolution topology end to end, contrasts the two viable connectivity patterns, and gets the DNS-failover semantics right so a regional cutover does not strand half your fleet on a stale private IP.
The cross-region gap nobody designs for
A private endpoint projects a NIC into a single subnet, in a single VNet, in a single region, and assigns the target PaaS resource one private IP from that subnet. Everything about it is regional. The DNS record it produces — kv-app-eus2.privatelink.vaultcore.azure.net A 10.20.3.7 — is a global fact the moment you publish it into a Private DNS zone, because Private DNS zones are global resources with no region affinity.
That asymmetry creates four failure modes that single-region designs never surface:
- One record, two regions. If East US 2 and West US 2 each have their own endpoint for the same logical resource, the zone holds two A records for closely related names. Resolve the wrong one and your East app crosses regions for every call to what should be a local dependency.
- Cross-region peering tax. A private IP is only routable if there is a path. An endpoint in East US 2 is reachable from West US 2 only over global VNet peering or a connected hub — and that traffic is billed and adds latency on every request.
- Stale records during cutover. Private DNS has its own TTL. Tear down a regional endpoint during failover without thinking about TTL and clients keep dialling a private IP whose NIC no longer exists. That is a black hole, not a failover.
- Hybrid clients. On-prem resolvers forwarding to a single regional DNS Resolver inbound endpoint will happily hand a Frankfurt user the East US private IP, sending the connection back across the ocean.
The rest of this article fixes each of these deliberately.
Step 1 - Lay down a global resolution topology
Private DNS zones are global, so you create each privatelink zone once and link every VNet in every region to it. There is no per-region copy of the zone and no benefit to making one — a second copy only invites record drift.
DNS_RG="rg-connectivity-dns"
ZONE="privatelink.vaultcore.azure.net"
az network private-dns zone create -g "$DNS_RG" -n "$ZONE"
# Link the spoke (or hub) VNet in EACH region to the single global zone.
# registration-enabled false: these are PaaS A records, written by zone groups, not auto-registration.
for region in eus2 wus2; do
az network private-dns link vnet create \
-g "$DNS_RG" -n "link-spoke-$region" \
-z "$ZONE" \
-v "/subscriptions/<sub>/resourceGroups/rg-spoke-$region/providers/Microsoft.Network/virtualNetworks/vnet-spoke-$region" \
--registration-enabled false
done
The critical detail: a VNet using Azure-provided DNS automatically consults every Private DNS zone linked to it, via the platform resolver at 168.63.129.16. So once both regional VNets are linked to the one global zone, a workload in either region can resolve every record in it. Resolution is solved globally by the link topology; the hard part — which we attack next — is making sure each region resolves the right record, the local one.
Step 2 - Resolution topology: regional A records under one zone
Give each regional resource a distinct name so the zone can carry a distinct A record per region. Most PaaS services already do this for you: the resource name is globally unique, so kv-app-eus2 and kv-app-wus2 naturally produce two separate privatelink records.
privatelink.vaultcore.azure.net
├─ kv-app-eus2 A 10.20.3.7 (endpoint NIC in East US 2)
└─ kv-app-wus2 A 10.30.3.7 (endpoint NIC in West US 2)
The public CNAME chain points each public FQDN at its own privatelink alias:
kv-app-eus2.vault.azure.net
└─ CNAME kv-app-eus2.privatelink.vaultcore.azure.net (Azure public DNS)
└─ A 10.20.3.7 (only because YOU host privatelink.vaultcore.azure.net)
Because the names differ, there is no ambiguity and no record collision. The architectural question becomes: how does the East region’s app come to call kv-app-eus2 rather than kv-app-wus2? You never want application code to hard-code a regional FQDN; you want each instance to dial a region-neutral name that resolves locally. That is the job of the two patterns below.
Always create the endpoint with a private-dns-zone-group so the A record is written and reaped automatically with the endpoint lifecycle. Hand-written records are the number-one cause of stale entries during failover.
az network private-endpoint create \
-g rg-spoke-eus2 -n pe-kv-app-eus2 \
--vnet-name vnet-spoke-eus2 --subnet snet-pe \
--private-connection-resource-id "/subscriptions/<sub>/resourceGroups/rg-data-eus2/providers/Microsoft.KeyVault/vaults/kv-app-eus2" \
--group-id vault --connection-name kv-eus2-conn
az network private-endpoint dns-zone-group create \
-g rg-spoke-eus2 --endpoint-name pe-kv-app-eus2 \
-n zg-vault --private-dns-zone "$ZONE" --zone-name vault
Step 3 - Pattern A: regional endpoints with geo-aware steering in front
This is the pattern you want for almost every active-active design. Each region owns its own private endpoint to its own regional PaaS instance, and a steering layer in front hands each client the regional entry point. Crucially, the geo-steering is on the application ingress, not on the private dependency. The private endpoints stay strictly local — East app to East data, West app to West data — and never cross a region for a backend call.
Front Door / Traffic Manager (geo steering)
/ \
app-eus2 (region East) app-wus2 (region West)
| |
pe -> kv-app-eus2 (10.20.3.7) pe -> kv-app-wus2 (10.30.3.7)
pe -> sql-app-eus2 (private) pe -> sql-app-wus2 (private)
The region-neutral indirection lives in app configuration, not DNS: each regional deployment is parameterised with its own dependency FQDNs (KEYVAULT_URI=https://kv-app-eus2.vault.azure.net). The app instance therefore always resolves a name whose privatelink A record is the local endpoint, and the platform resolver returns the local private IP because both records live in the one global zone.
The properties that make Pattern A the default:
| Property | Pattern A outcome |
|---|---|
| Backend path | Always intra-region; no peering hop on the data call |
| Per-request latency | Lowest — local private IP |
| Cross-region data egress | None for backend calls |
| Failover unit | Whole region, drained at the ingress steering layer |
| Blast radius of an endpoint failure | One region |
The trade is that you must run and pay for a private endpoint per resource per region, and you need geo-aware steering at ingress. For an active-active estate that is exactly what you already have. Front Door’s anycast edge fails over per request when a region’s origin health probe goes red; Traffic Manager does it at DNS for non-HTTP front ends. Either way the private layer never has to make a routing decision.
Step 4 - Pattern B: one endpoint reachable cross-region
Sometimes you genuinely have a single regional resource — a primary database, a license server, a stateful component that cannot be regionally duplicated — and both regions must reach it privately. Here you accept exactly one endpoint and make it routable from the far region over global VNet peering (or through connected hubs).
# Global peering so West US 2 can route to the East US 2 endpoint subnet.
az network vnet peering create \
-g rg-spoke-wus2 -n peer-wus2-to-eus2 \
--vnet-name vnet-spoke-wus2 \
--remote-vnet "/subscriptions/<sub>/resourceGroups/rg-spoke-eus2/providers/Microsoft.Network/virtualNetworks/vnet-spoke-eus2" \
--allow-vnet-access --allow-forwarded-traffic
# (create the symmetric peering on the EUS2 side as well)
Because the zone is global and both VNets are linked, West US 2 already resolves sql-primary.privatelink.database.windows.net to the East private IP. Peering supplies the route; the connection now succeeds. What it costs you is unavoidable and must be on the design record:
Every byte the West region sends to the East endpoint crosses a global VNet peering and is billed as inbound and outbound inter-region transfer, plus the physical round-trip latency between regions on every request. For a chatty data tier that is a per-query tax measured in tens of milliseconds and real money. Use Pattern B only for components that genuinely cannot be regionalised.
A frequent refinement is to terminate the cross-region hop at the hub rather than mesh every spoke: spokes peer to their regional hub, hubs are connected (peering or a global ER/VPN), and one endpoint lives in or behind a hub. The cost and latency characteristics are the same; you have simply centralised the path.
The honest decision rule: Pattern A unless a resource is physically single-homed; Pattern B only for that resource, and only with the egress line item written down.
Step 5 - Failover DNS: TTL and avoiding stale private records
This is the step single-region designs never have to think about, and it is where cutovers fail. Two distinct DNS layers govern a regional failover, and they have different TTL semantics:
-
The public steering layer (Front Door anycast, or Traffic Manager DNS). Front Door fails over per request with no DNS change at all. Traffic Manager fails over by changing its DNS answer, gated by its record TTL — set it to 30 seconds, never the 300s default, and remember client resolvers may still cache the old answer until it expires.
-
The Private DNS records behind each region. A
privatelinkA record has its own TTL. The default for records written by a private DNS zone group is 10 seconds, which is deliberately short — but only matters if the record is removed or changed.
The failure pattern: an operator “drains” East by deleting the East private endpoint. The zone group reaps the A record. But the public steering layer has not yet stopped sending East-region clients to East. Worse, in Pattern B, the surviving region was resolving that exact record to reach a cross-region dependency. For the few seconds the record’s TTL is still cached, clients dial a private IP whose NIC is gone — a black hole that is harder to diagnose than a clean public failover, because the name still resolves, it just resolves to nothing.
Two rules keep cutovers clean:
- Drain at the steering layer first, tear down endpoints last. Pull the region out of Front Door / set its Traffic Manager endpoint to
Disabledand let in-flight connections drain before you touch any private endpoint. The private record should be the last thing to disappear, not the first. - Pre-provision the standby; never resolve toward nothing. In active-active both regional endpoints already exist and both A records are always present, so failover is purely a steering decision and there is no private record to add or remove on the hot path. This is the single biggest reason active-active beats cold-standby for private connectivity: the DNS is static and correct in every state.
Inspect the live record and its TTL before any cutover:
az network private-dns record-set a show \
-g rg-connectivity-dns -z privatelink.database.windows.net \
-n sql-primary --query "{ttl:ttl, ips:aRecords[].ipv4Address}" -o jsonc
# Resolve from inside a regional VNet to confirm you get the LOCAL private IP.
Resolve-DnsName kv-app-eus2.vault.azure.net -Type A |
Where-Object { $_.QueryType -eq 'A' } | Select-Object Name, IPAddress
Step 6 - Hybrid clients: on-prem to the correct regional IP
On-prem resolvers cannot see Private DNS zones directly; they must forward to something inside Azure that can. The supported front door is the Azure DNS Private Resolver inbound endpoint, and the cross-region trap is forwarding all on-prem queries to a single regional inbound endpoint. A resolver in Frankfurt that forwards to an East US inbound endpoint will resolve kv-app-eus2 to the East private IP and route the user across the Atlantic.
Deploy a DNS Private Resolver with an inbound endpoint in each region, then make the on-prem forwarder geo-aware about which inbound IP it targets:
resource resolver 'Microsoft.Network/dnsResolvers@2023-07-01-preview' = {
name: 'dnspr-eus2'
location: 'eastus2'
properties: {
virtualNetwork: { id: hubVnetEus2Id }
}
}
resource inbound 'Microsoft.Network/dnsResolvers/inboundEndpoints@2023-07-01-preview' = {
parent: resolver
name: 'inbound-eus2'
location: 'eastus2'
properties: {
ipConfigurations: [
{
privateIpAllocationMethod: 'Dynamic'
subnet: { id: inboundSubnetEus2Id } // delegated to Microsoft.Network/dnsResolvers
}
]
}
}
On the on-prem side, point each site’s conditional forwarder for the privatelink suffixes at its nearest regional inbound endpoint:
# Frankfurt site forwarder -> West Europe / nearest inbound endpoint IP
forward zone "privatelink.vaultcore.azure.net" -> 10.40.0.4
forward zone "privatelink.database.windows.net" -> 10.40.0.4
# US sites forward the same suffixes -> East US 2 inbound endpoint
forward zone "privatelink.vaultcore.azure.net" -> 10.20.0.4
For a true active-active dependency you also want the inbound endpoints behind a health-aware front so a dead regional resolver fails over — but the first-order fix is simply: each site forwards to its own region, so geography of the client drives which regional private IP it learns.
Step 7 - Data-tier interplay: pair private links with geo-replicated PaaS
Private connectivity and data replication are orthogonal and you must wire both, or you build a fast private path to a database that cannot fail over.
- Azure SQL with a failover group exposes a
<fog-name>.database.windows.netread-write listener that always points at the current primary. Put a private endpoint on the failover group on each region’s secondary server, and the listener name resolves to whichever region is primary — private connectivity follows the role automatically. - Cosmos DB in multi-region write mode wants a private endpoint per region; the SDK’s multi-region awareness then prefers the local account region.
- Storage with RA-GRS / object replication needs an endpoint per account; the secondary endpoint (
-secondary) is a distinct name with its own record.
The trap is putting a private endpoint only on the SQL primary server’s FQDN. On failover the primary moves regions, but that server-specific name does not follow it — the failover group listener does. Bind privacy to the listener, not the server:
# Private endpoint targeting the FAILOVER GROUP listener, group-id sqlServer.
az network private-endpoint create \
-g rg-data-wus2 -n pe-sqlfog-wus2 \
--vnet-name vnet-spoke-wus2 --subnet snet-pe \
--private-connection-resource-id "/subscriptions/<sub>/resourceGroups/rg-data-wus2/providers/Microsoft.Sql/servers/sql-app-wus2" \
--group-id sqlServer --connection-name fog-wus2-conn
Now app-fog.database.windows.net resolves privately in both regions, and a failover-group flip moves the read-write role without any DNS surgery on your side.
Enterprise scenario
A global payments platform ran active-active across East US 2 and West Europe behind Azure Front Door. The web tier failed over in well under a second on a regional brownout — the anycast edge simply stopped sending the unhealthy origin traffic. But during a game-day, when they drained West Europe, European card-auth requests that did land on the still-serving East US 2 region spiked to ~1.4s of added latency and a noticeable inter-region transfer bill appeared overnight.
The root cause was a Pattern-B mistake hiding inside a Pattern-A design. Both regions had been pointed at a single shared private endpoint for the tokenization vault (a Key Vault holding the HSM-backed key), provisioned only in East US 2 because “Key Vault is highly available anyway.” Every West Europe auth call resolved kv-token.privatelink.vaultcore.azure.net to the East US 2 private IP and crossed the ocean over hub peering — fine at idle, brutal under the failover load that pushed all European traffic onto that one cross-region path.
The fix was to regionalise the dependency: stand up a Key Vault in West Europe, replicate the keys, put a private endpoint in each region, and parameterise each regional deployment with its local vault URI so the privatelink record it resolves is always the local one. The shared cross-region endpoint and its peering hop were retired.
# Per-region vault URI injected into each regional deployment - no cross-region resolve.
az containerapp update -g rg-app-weu -n app-weu \
--set-env-vars KEYVAULT_URI="https://kv-token-weu.vault.azure.net"
az containerapp update -g rg-app-eus2 -n app-eus2 \
--set-env-vars KEYVAULT_URI="https://kv-token-eus2.vault.azure.net"
Added latency on the vault call dropped from ~1.4s to under 5ms, and the inter-region transfer line item went to near zero. The lesson they wrote into their reference architecture: in an active-active topology, every private dependency must be regional unless it is provably impossible to regionalise — and any cross-region private endpoint is a named exception with a cost owner, not a default.
Verify
Validate resolution and connectivity from each region independently, then force a failover and confirm the surviving region is unaffected.
# 1. From a VM/container in EACH region, confirm the LOCAL private IP is returned.
# East should get the East endpoint IP; West should get the West endpoint IP.
nslookup kv-app-eus2.vault.azure.net # run on East workload -> 10.20.3.7
nslookup kv-app-wus2.vault.azure.net # run on West workload -> 10.30.3.7
# 2. Confirm the failover-group listener resolves privately and connects in both regions.
nslookup app-fog.database.windows.net # private IP, both regions
nc -vz app-fog.database.windows.net 1433 # TCP reachability over the private path
// 3. Prove the data call stays intra-region: no cross-region peering bytes for backend traffic.
// Region-to-region peering traffic should be ~0 for a correct Pattern A.
AzureNetworkAnalytics_CL
| where SubType_s == "FlowLog"
| where DestPort_d in (443, 1433)
| extend src = SrcIP_s, dst = DestIP_s
| where dst startswith "10.30." and src startswith "10.20." // East talking to West data = a finding
| summarize crossRegionFlows = count() by bin(TimeGenerated, 5m)
# 4. Force a regional failover and confirm no stale private record.
# a) Drain at the steering layer FIRST.
az network traffic-manager endpoint update \
-g rg-global -n ep-eus2 --profile-name tm-app --type azureEndpoints \
--endpoint-status Disabled
# b) Confirm the surviving region still resolves its OWN local record and serves.
# c) Only after drain: tear down the East endpoint if the design calls for it.
A correct topology shows: each region resolves its local private IP, the failover-group listener resolves privately in both, KQL shows no cross-region backend flows for Pattern A, and draining a region at the steering layer leaves the survivor’s resolution and connectivity completely untouched.