Going multi-region is the easy part. Routing users to the right region, detecting a regional brownout in seconds, and failing over without manual intervention is where most designs fall apart. This article combines Azure Front Door’s anycast Layer-7 edge with Traffic Manager’s DNS steering to build active-active and active-passive topologies, with health probes, WAF at the edge, and origin lockdown.
Choosing the right global tier
Azure has three “global-ish” load balancing services and they are not interchangeable. Pick wrong and you either pay for capability you can’t use or hit a protocol wall later.
| Service | OSI layer | Steering mechanism | Failover trigger | Best for |
|---|---|---|---|---|
| Front Door (Standard/Premium) | L7 (HTTP/HTTPS) | Anycast + reverse proxy | Backend health probe at the edge | Web apps, APIs, anything HTTP that wants caching/WAF/TLS offload |
| Traffic Manager | DNS (L-none, it never sees traffic) | DNS responses (CNAME/A) | Endpoint health probe, reflected in DNS answers | Any protocol; non-HTTP endpoints; nesting other globals |
| Cross-region Load Balancer | L4 (TCP/UDP) | Anycast frontend IP | Regional LB health | Non-HTTP TCP/UDP that still wants a single anycast IP and connection-level failover |
The decisive question is the protocol. If everything is HTTP(S), Front Door alone covers the common case and gives you a WAF and edge caching for free. The moment you have SMTP, a game server, a custom TCP service, or you need to route between heterogeneous endpoint types, you reach for Traffic Manager or cross-region Load Balancer.
Front Door and Traffic Manager are not mutually exclusive. A robust pattern is Front Door for the web tier with Traffic Manager nested above it to add DNS-level failover across endpoint types Front Door cannot represent. We build exactly that in Step 4.
Anycast Layer-7 vs DNS-based steering
This distinction drives your failover latency budget, so be precise about it.
Front Door uses anycast. The same IP is announced from every Microsoft edge POP, so a client’s packets land at the nearest POP by BGP. The POP terminates TLS, runs the WAF, and reverse-proxies to a healthy origin. Because the edge probes origins and holds the connection, when an origin dies the edge simply stops sending it traffic - no DNS change, no IP change. Failover is effectively immediate from the client’s perspective; the edge decides per request.
Traffic Manager uses DNS. It returns the address (or CNAME) of a healthy endpoint, and clients then connect directly. Failover speed is therefore gated by DNS TTL plus resolver caching plus probe interval. Even with a 30-second TTL, a client that resolved 25 seconds ago keeps hitting the dead endpoint until its cache expires. You cannot make DNS-based failover sub-second; plan for tens of seconds to a couple of minutes in the real world.
The takeaway: put HTTP traffic behind Front Door for fast, edge-driven failover, and use Traffic Manager only where you need protocol breadth or nesting, accepting its DNS-bound latency.
Step 1 - Multi-region origin group with health probes and latency routing
We’ll define an origin group with two regional backends (East US 2 and West Europe) and let latency-based routing send each client to the lowest-latency healthy origin. I’ll use the az afd CLI throughout; it maps cleanly to Bicep/Terraform if you prefer declarative.
RG=rg-gtm-prod
PROFILE=afd-gtm-prod
ENDPOINT=app-gtm
# Premium tier is required if you want managed/custom WAF rules and Private Link origins.
az afd profile create \
--resource-group $RG \
--profile-name $PROFILE \
--sku Premium_AzureFrontDoor
az afd endpoint create \
--resource-group $RG \
--profile-name $PROFILE \
--endpoint-name $ENDPOINT \
--enabled-state Enabled
Now the origin group. The health probe and load-balancing settings live on the group, not the individual origins.
az afd origin-group create \
--resource-group $RG \
--profile-name $PROFILE \
--origin-group-name og-web \
--probe-request-type GET \
--probe-protocol Https \
--probe-path /healthz \
--probe-interval-in-seconds 30 \
--sample-size 4 \
--successful-samples-required 3 \
--additional-latency-in-milliseconds 50
A few of these flags carry real weight:
--probe-path /healthzshould hit a deep endpoint that checks downstream dependencies (DB, cache), not a static 200. A shallow probe keeps an origin “healthy” while it’s failing real requests.--additional-latency-in-milliseconds 50is the latency sensitivity. Front Door treats any origin within this margin of the fastest as “equally close” and load-balances across them. Widen it to spread load across regions; narrow it to pin clients to the nearest.--sample-size 4with--successful-samples-required 3means an origin must pass 3 of the last 4 probes to stay in rotation.
Add the two regional origins. --priority and --weight are what we tune in Step 2; for pure latency routing, leave them equal.
az afd origin create \
--resource-group $RG --profile-name $PROFILE \
--origin-group-name og-web --origin-name eastus2 \
--host-name app-eastus2.azurewebsites.net \
--origin-host-header app-eastus2.azurewebsites.net \
--http-port 80 --https-port 443 \
--priority 1 --weight 1000 --enabled-state Enabled
az afd origin create \
--resource-group $RG --profile-name $PROFILE \
--origin-group-name og-web --origin-name westeurope \
--host-name app-westeurope.azurewebsites.net \
--origin-host-header app-westeurope.azurewebsites.net \
--http-port 80 --https-port 443 \
--priority 1 --weight 1000 --enabled-state Enabled
Finally a route ties the endpoint to the origin group:
az afd route create \
--resource-group $RG --profile-name $PROFILE \
--endpoint-name $ENDPOINT --route-name web-route \
--origin-group og-web \
--supported-protocols Https \
--https-redirect Enabled \
--forwarding-protocol HttpsOnly \
--link-to-default-domain Enabled
With both origins at equal priority and weight, Front Door now performs latency routing: each edge POP picks the closest healthy origin, and probes it every 30s.
Step 2 - Priority routing (active-passive) vs weighted (active-active)
Front Door’s load-balancing algorithm is a two-level decision: priority first, then weight within the lowest healthy priority tier.
Active-passive with priority
Set the standby region to a higher --priority number (higher number = lower precedence). Front Door only sends traffic to priority 2 when all priority 1 origins are unhealthy.
# Promote eastus2 to primary, demote westeurope to warm standby.
az afd origin update \
--resource-group $RG --profile-name $PROFILE \
--origin-group-name og-web --origin-name eastus2 \
--priority 1
az afd origin update \
--resource-group $RG --profile-name $PROFILE \
--origin-group-name og-web --origin-name westeurope \
--priority 2
Now all traffic flows to East US 2. If its probes fail, the edge drains it and shifts everything to West Europe; when East US 2 recovers, traffic fails back automatically. This is the classic warm-standby model: cheap and simple, but clients far from the surviving region pay a latency penalty during failover.
Active-active with weights
Keep all origins at the same priority and let weight split traffic. Weights are proportional, not percentages.
# 70/30 split across two active regions at equal priority.
az afd origin update \
--resource-group $RG --profile-name $PROFILE \
--origin-group-name og-web --origin-name eastus2 \
--priority 1 --weight 700
az afd origin update \
--resource-group $RG --profile-name $PROFILE \
--origin-group-name og-web --origin-name westeurope \
--priority 1 --weight 300
Important nuance: at equal priority, latency routing takes precedence over weight. Front Door first filters to origins within the
additional-latency-in-millisecondswindow, then applies weights among that filtered set. If your two regions are far apart, most edges see only one region as “closest” and weights appear to be ignored. Weighted distribution is most visible when origins are latency-comparable from a given edge, or when you deliberately widen the latency window.
For genuine active-active where both regions take live writes, the hard problem is not routing but data: you need multi-region writes (Cosmos DB multi-region, or app-level conflict handling). Front Door will happily send a user to either region; your data layer has to cope.
Step 3 - WAF policy and custom rules at the edge
A core reason to front everything with Front Door is the WAF running at the POP, blocking attacks before they reach your origins. Create a Premium WAF policy, attach the managed rulesets, and add a custom rule.
# WAF policies live under the 'network front-door' command group.
az network front-door waf-policy create \
--resource-group $RG \
--name wafGtmProd \
--sku Premium_AzureFrontDoor \
--mode Prevention
Add the Microsoft-managed Default Rule Set (DRS) and the Bot Manager ruleset. In Prevention mode these actively block; start in Detection mode in a new environment to baseline false positives before flipping to Prevention.
az network front-door waf-policy managed-rules add \
--resource-group $RG --policy-name wafGtmProd \
--type Microsoft_DefaultRuleSet --version 2.1 --action Block
az network front-door waf-policy managed-rules add \
--resource-group $RG --policy-name wafGtmProd \
--type Microsoft_BotManagerRuleSet --version 1.0
A common custom rule: rate-limit by client IP to blunt credential-stuffing. This is a rate-limit rule allowing 100 requests per minute per IP.
az network front-door waf-policy rule create \
--resource-group $RG --policy-name wafGtmProd \
--name rateLimitLogin --priority 10 \
--rule-type RateLimitRule \
--rate-limit-duration 1 \
--rate-limit-threshold 100 \
--action Block --defer
az network front-door waf-policy rule match-condition add \
--resource-group $RG --policy-name wafGtmProd \
--name rateLimitLogin \
--match-variable RequestUri --operator Contains \
--values "/api/login"
Now associate the policy with the endpoint domain through a security policy on the AFD profile:
WAF_ID=$(az network front-door waf-policy show \
--resource-group $RG --name wafGtmProd --query id -o tsv)
az afd security-policy create \
--resource-group $RG --profile-name $PROFILE \
--security-policy-name sp-web \
--domains $(az afd endpoint show -g $RG --profile-name $PROFILE \
--endpoint-name $ENDPOINT --query id -o tsv) \
--waf-policy $WAF_ID
The WAF now evaluates every request at the edge before any origin is contacted. Geo-filtering, IP allowlists for admin paths, and header-based rules all live here too.
Step 4 - Layering Traffic Manager for nested DNS failover
Front Door covers HTTP. Suppose you also expose a non-HTTP service - an MQTT broker or SMTP relay - and want a single hostname that fails over across regions for all of it. Traffic Manager sits above Front Door as a DNS-level coordinator.
Create a profile with priority routing. The primary endpoint is the Front Door endpoint (an externalEndpoints target by FQDN); the secondary can be another global service or a regional endpoint.
az network traffic-manager profile create \
--resource-group $RG --name tm-gtm-prod \
--routing-method Priority \
--unique-dns-name gtm-prod-app \
--ttl 30 \
--protocol HTTPS --port 443 --path "/healthz" \
--interval 30 --timeout 10 --max-failures 3
Add Front Door as the primary external endpoint. Because Front Door is itself anycast and highly available, this endpoint rarely fails - but Traffic Manager gives you a DNS escape hatch to a separate stack (another cloud, or a static maintenance page) if Front Door ever returns unhealthy.
az network traffic-manager endpoint create \
--resource-group $RG --profile-name tm-gtm-prod \
--name afd-primary --type externalEndpoints \
--target app-gtm-xxxx.z01.azurefd.net \
--endpoint-status Enabled --priority 1
az network traffic-manager endpoint create \
--resource-group $RG --profile-name tm-gtm-prod \
--name dr-secondary --type externalEndpoints \
--target dr-static.example.net \
--endpoint-status Enabled --priority 2
Nesting caveat: when an endpoint is itself another Traffic Manager profile, use the
nestedEndpointstype and setmin-child-endpointsso the parent only considers the child “healthy” when enough of its children are up. For anexternalEndpointsFront Door target, Traffic Manager just probes the FQDN’s health path directly.
Be honest about what this buys you: the DNS failover here is bound by the 30s TTL and resolver caching, so it is slow relative to Front Door’s in-line failover. Use Traffic Manager as the coarse, cross-stack safety net and let Front Door handle the fast, fine-grained regional decisions underneath it.
Origin lockdown: only accept traffic from your Front Door
A latency-routed, WAF-protected edge is pointless if attackers can hit your origins directly and bypass all of it. Lock origins down with two complementary controls.
1. Restrict inbound to the AzureFrontDoor.Backend service tag. On the NSG protecting your origins (or via App Service access restrictions), allow inbound only from that service tag.
az network nsg rule create \
--resource-group $RG --nsg-name nsg-origins \
--name Allow-AFD-Backend --priority 100 \
--direction Inbound --access Allow --protocol Tcp \
--source-address-prefixes AzureFrontDoor.Backend \
--destination-port-ranges 443 \
--destination-address-prefixes VirtualNetwork
The service tag alone is necessary but not sufficient - it permits any Front Door tenant in Azure, not just yours. That’s why you also validate a per-profile header.
2. Validate the X-Azure-FDID header. Every Front Door profile has a unique ID sent in the X-Azure-FDID request header. Your origin (or the App Service / APIM in front of it) must reject requests whose header doesn’t match your profile ID.
# Retrieve your profile's Front Door ID.
az afd profile show \
--resource-group $RG --profile-name $PROFILE \
--query frontDoorId -o tsv
For App Service, enforce it via an access-restriction rule on the X-Azure-FDID header so only your profile’s traffic is admitted:
az webapp config access-restriction add \
--resource-group $RG --name app-eastus2 \
--rule-name allow-our-afd --priority 100 --action Allow \
--http-header x-azure-fdid=<your-front-door-id>
Together these mean packets can only arrive from the Front Door backend range and must carry your unique profile ID. Direct-to-origin attacks are shut down.
Enterprise scenario
A payments platform ran active-passive across East US 2 and West Europe behind Front Door, priority routing, /healthz probing the API only. During a Cosmos DB regional incident East US 2 stayed “healthy” - the API was up, but every write 500’d. Front Door kept all traffic on the broken primary; the probe never saw the dependency. The fix was a probe that exercised the real failure mode: a dedicated /healthz/deep that does a lightweight Cosmos point-read against the region’s write endpoint and returns 503 when the SDK reports the region isn’t writable.
app.MapGet("/healthz/deep", async (CosmosClient c) =>
{
try
{
var db = c.GetContainer("ledger", "txn");
// Point-read forces a round-trip to the regional write endpoint.
await db.ReadItemAsync<object>("probe", new PartitionKey("probe"),
new ItemRequestOptions { ConsistencyLevel = ConsistencyLevel.Strong });
return Results.Ok();
}
catch (CosmosException ex) when (ex.StatusCode is HttpStatusCode.ServiceUnavailable
or HttpStatusCode.TooManyRequests)
{
return Results.StatusCode(503); // Drain this origin.
}
});
The gotcha is tuning: a strong-consistency read every 30s from every edge POP adds real RU cost and can itself trip 429s, which would falsely drain a healthy region. They moved the probe to --probe-interval-in-seconds 30 with --sample-size 4 --successful-samples-required 2 (tolerate one transient 429) and pinned the probe to a tiny dedicated container with its own throughput. After that, the next regional write outage drained East US 2 in under two minutes with zero failed customer writes.
Verify
Confirm routing, failover, and lockdown actually behave as designed - never trust the config alone.
Traffic flows through the edge. The X-Cache and X-Azure-Ref response headers prove the request went through Front Door.
curl -sSI https://app-gtm-xxxx.z01.azurefd.net/ \
| grep -iE 'x-cache|x-azure-ref|server'
Origin lockdown holds. A direct request to the origin without the header should be rejected (403), while traffic via Front Door succeeds.
# Direct to origin - expect 403 from the access restriction.
curl -sS -o /dev/null -w "%{http_code}\n" https://app-eastus2.azurewebsites.net/
Traffic Manager answers with the primary. Resolve the DNS name and verify the TTL and target.
dig +noall +answer gtm-prod-app.trafficmanager.net
The WAF blocks. Send a request that trips the managed ruleset and expect a 403.
curl -sS -o /dev/null -w "%{http_code}\n" \
"https://app-gtm-xxxx.z01.azurefd.net/?q=' OR 1=1--"
Run a failover drill. Disable the primary origin and watch the edge shift traffic without any client-side change:
az afd origin update -g $RG --profile-name $PROFILE \
--origin-group-name og-web --origin-name eastus2 \
--enabled-state Disabled
# Repeated curls should keep returning 200, now served by West Europe.
Observe routing decisions in logs. Enable diagnostic settings on the profile and query the access log; the OriginName / BackendHostname column shows which origin served each request.
AzureDiagnostics
| where Category == "FrontDoorAccessLog"
| summarize count() by OriginName_s, httpStatusCode_s
| order by count_ desc
Production checklist
Pitfalls
A handful of traps catch even experienced teams:
- Shallow health probes. A probe that returns 200 while the app can’t reach its database keeps a broken origin in rotation. Exercise real dependencies, but keep the probe cheap enough that a 30s interval across every edge POP doesn’t hammer your backend.
- Expecting weighted routing to override latency. At equal priority, Front Door filters by latency window first. If regions are far apart, weights barely move traffic. Use priority for failover and treat weighting as a same-latency-tier tool.
- Forgetting the
X-Azure-FDIDcheck. The service tag permits every Front Door tenant in Azure. Without header validation, your “locked down” origin is open to anyone willing to stand up their own Front Door profile. - Treating DNS failover as fast. Traffic Manager failover is bound by TTL plus resolver caching. Don’t put it on the critical path for sub-minute RTOs; that’s Front Door’s job.
- Tuning probe interval in isolation. Lowering the interval speeds detection but multiplies probe load (every edge POP probes independently). Tune interval, sample size, and samples-required together against a real drill, not in theory.
Next steps
Wire origin-health-flip alerts into your on-call rotation, add a synthetic canary that exercises the full path through Front Door every minute, and rehearse a full regional failover quarterly. Once active-passive is solid and your data layer supports it, graduate to active-active and measure the latency win against the added operational complexity.