Networking Azure

Routing All Egress Through Azure Firewall: UDRs, Forced Tunneling, and Policy

Deploying Azure Firewall in the hub is the easy part. Making sure traffic actually traverses it – inbound, outbound, and spoke-to-spoke – is where most hub-spoke designs quietly leak. This walkthrough builds a fully inspected topology with user-defined routes (UDRs) and Firewall Policy, then digs into the failure modes that look like firewall bugs but are really routing bugs.

The default-route problem: why a deployed firewall inspects nothing

Every Azure subnet ships with invisible system routes. The two that matter here:

A VM in a spoke reaches the internet because the system default route sends it straight to Azure’s internet edge – not through your hub. Peering a spoke to the hub does not redirect that traffic; peering only adds VirtualNetwork routes for the peered ranges. So you can stand up Azure Firewall, point DNS at it, write beautiful rules, and still have every spoke egress the internet directly, completely bypassing inspection.

The firewall only sees what you route to it. There is no transparent/inline mode in Azure – inspection is entirely a function of UDRs. If the effective route for a destination does not point at the firewall, that traffic is invisible to it.

The fix is to override system routes with UDRs whose next hop is the firewall’s private IP, applied to every spoke workload subnet and (for forced tunneling) the gateway subnet.

Step 1 – Deploy Azure Firewall with a Firewall Policy and rule collection groups

Use Firewall Policy (not classic rules). Policy is the modern object model: it supports rule collection groups, hierarchy/inheritance (a base policy plus child policies), IDPS, TLS inspection, and central management across firewalls.

The hierarchy is strict and worth memorizing:

Firewall Policy
└─ Rule Collection Group   (has a priority; an ordering container)
   └─ Rule Collection      (has a priority + action: Allow/Deny for net/app, or DNAT)
      └─ Rule              (the actual match: source, dest, port, protocol/FQDN)

Processing order across types is fixed regardless of priority numbers: DNAT first, then Network, then Application. Within a type, lower priority number wins.

The firewall needs a dedicated /26 subnet named exactly AzureFirewallSubnet. Deploy:

RG=rg-hub-prod
LOC=eastus2
HUB_VNET=vnet-hub

# Dedicated subnet for the firewall (must be this exact name, /26 minimum)
az network vnet subnet create \
  --resource-group "$RG" --vnet-name "$HUB_VNET" \
  --name AzureFirewallSubnet --address-prefixes 10.0.1.0/26

# Public IP for SNAT egress (Standard SKU, static)
az network public-ip create \
  --resource-group "$RG" --name pip-azfw \
  --sku Standard --allocation-method Static

# Firewall Policy (Premium unlocks IDPS + TLS inspection)
az network firewall policy create \
  --resource-group "$RG" --name afwp-hub \
  --sku Premium --location "$LOC"

# The firewall itself, bound to the policy
az network firewall create \
  --resource-group "$RG" --name afw-hub --location "$LOC" \
  --sku AZFW_VNet --tier Premium \
  --firewall-policy afwp-hub

Associate the public IP and the firewall’s subnet via an IP configuration, then capture the private IP – this is the next-hop address every UDR will use:

az network firewall ip-config create \
  --resource-group "$RG" --firewall-name afw-hub \
  --name fw-ipconfig --vnet-name "$HUB_VNET" \
  --public-ip-address pip-azfw

# Persist the firewall private IP for the UDRs that follow
FW_PRIVATE_IP=$(az network firewall show \
  --resource-group "$RG" --name afw-hub \
  --query "ipConfigurations[0].privateIPAddress" -o tsv)
echo "Firewall private IP: $FW_PRIVATE_IP"   # e.g. 10.0.1.4

Now lay down the rule collection groups. I separate them by concern so priorities stay sane as the estate grows:

# Group 1: DNAT (lowest number = evaluated earliest among groups)
az network firewall policy rule-collection-group create \
  --resource-group "$RG" --policy-name afwp-hub \
  --name rcg-dnat --priority 100

# Group 2: Network rules (L3/L4 allow/deny)
az network firewall policy rule-collection-group create \
  --resource-group "$RG" --policy-name afwp-hub \
  --name rcg-network --priority 200

# Group 3: Application rules (L7 FQDN filtering)
az network firewall policy rule-collection-group create \
  --resource-group "$RG" --policy-name afwp-hub \
  --name rcg-application --priority 300

Step 2 – Write UDRs to pull spoke egress through the firewall

Create a route table, add the override default route, and associate it to each spoke workload subnet. --next-hop-type VirtualAppliance plus the firewall private IP is the entire trick.

SPOKE_RG=rg-spoke-app
SPOKE_VNET=vnet-spoke-app

az network route-table create \
  --resource-group "$SPOKE_RG" --name rt-spoke-egress

# Override the system 0.0.0.0/0 -> Internet route; send it to the firewall
az network route-table route create \
  --resource-group "$SPOKE_RG" --route-table-name rt-spoke-egress \
  --name default-to-firewall \
  --address-prefix 0.0.0.0/0 \
  --next-hop-type VirtualAppliance \
  --next-hop-ip-address "$FW_PRIVATE_IP"

# Apply to the workload subnet (NOT to AzureFirewallSubnet)
az network vnet subnet update \
  --resource-group "$SPOKE_RG" --vnet-name "$SPOKE_VNET" \
  --name snet-workload \
  --route-table rt-spoke-egress

Never associate a 0.0.0.0/0 -> firewall route to AzureFirewallSubnet. The firewall would try to route its own SNAT’d egress back to itself, creating a loop. The firewall subnet must keep its system default route to Internet.

Two longest-prefix-match details that bite people:

In Terraform the same intent is more maintainable across many spokes:

resource "azurerm_route_table" "spoke_egress" {
  name                = "rt-spoke-egress"
  location            = var.location
  resource_group_name = var.spoke_rg

  route {
    name                   = "default-to-firewall"
    address_prefix         = "0.0.0.0/0"
    next_hop_type          = "VirtualAppliance"
    next_hop_in_ip_address = var.fw_private_ip
  }
}

resource "azurerm_subnet_route_table_association" "workload" {
  subnet_id      = azurerm_subnet.workload.id
  route_table_id = azurerm_route_table.spoke_egress.id
}

Step 3 – Force east-west (spoke-to-spoke) traffic through the hub

By default, two spokes peered to the same hub cannot talk directly – peering is non-transitive, so there’s no data path between them at all unless you enable gateway/route propagation or peer them. The clean, inspected pattern is: add a UDR on each spoke that routes the other spokes’ ranges to the firewall, and let the hub forward between them.

For this to work the firewall acts as a router between spokes, which requires the spokes to be peered to the hub with traffic forwarding allowed, and a route on each spoke pointing the remote spoke CIDRs at the firewall:

# On spoke-app, send traffic destined for spoke-data to the firewall
az network route-table route create \
  --resource-group "$SPOKE_RG" --route-table-name rt-spoke-egress \
  --name to-spoke-data \
  --address-prefix 10.2.0.0/16 \
  --next-hop-type VirtualAppliance \
  --next-hop-ip-address "$FW_PRIVATE_IP"

# Symmetrically, on spoke-data, route spoke-app's range to the firewall
az network route-table route create \
  --resource-group rg-spoke-data --route-table-name rt-data-egress \
  --name to-spoke-app \
  --address-prefix 10.1.0.0/16 \
  --next-hop-type VirtualAppliance \
  --next-hop-ip-address "$FW_PRIVATE_IP"

The symmetry is mandatory. If spoke-app routes to spoke-data via the firewall but spoke-data replies directly (because it has no return UDR), you get classic asymmetric routing – the firewall sees a SYN with no matching return flow, drops the out-of-state packets, and the connection hangs. More on diagnosing that below.

Then a network rule collection lets the inspected east-west flow through:

az network firewall policy rule-collection-group collection add-filter-collection \
  --resource-group "$RG" --policy-name afwp-hub \
  --rule-collection-group-name rcg-network \
  --name allow-spoke-to-spoke \
  --collection-priority 200 \
  --action Allow \
  --rule-name app-to-data \
  --rule-type NetworkRule \
  --source-addresses 10.1.0.0/16 \
  --destination-addresses 10.2.0.0/16 \
  --destination-ports 443 1433 \
  --ip-protocols TCP

Step 4 – DNAT for inbound, FQDN application rules for outbound

Inbound with DNAT

To publish a workload, DNAT translates firewallPublicIP:port to the private backend. The DNAT rule implicitly creates the matching network allow, but the return path still depends on a UDR – the backend’s subnet must route 0.0.0.0/0 (or at least the client ranges) back through the firewall, or the response goes out the system default route and the flow is asymmetric.

PIP=$(az network public-ip show -g "$RG" -n pip-azfw --query ipAddress -o tsv)

az network firewall policy rule-collection-group collection add-nat-collection \
  --resource-group "$RG" --policy-name afwp-hub \
  --rule-collection-group-name rcg-dnat \
  --name inbound-web \
  --collection-priority 100 \
  --action DNAT \
  --rule-name https-to-appvm \
  --destination-addresses "$PIP" \
  --destination-ports 443 \
  --source-addresses "*" \
  --translated-address 10.1.0.10 \
  --translated-port 443 \
  --ip-protocols TCP

Outbound with FQDN application rules

Application rules filter HTTP/HTTPS (and optionally any TCP via FQDN) by destination name, doing TLS SNI inspection so you allow *.ubuntu.com without hardcoding IPs:

az network firewall policy rule-collection-group collection add-filter-collection \
  --resource-group "$RG" --policy-name afwp-hub \
  --rule-collection-group-name rcg-application \
  --name allow-os-updates \
  --collection-priority 300 \
  --action Allow \
  --rule-name linux-repos \
  --rule-type ApplicationRule \
  --source-addresses 10.1.0.0/16 \
  --protocols Http=80 Https=443 \
  --target-fqdns "*.ubuntu.com" "*.azure.com"

Application rules do not SNAT to a single port the way you might expect, but network rules and DNAT do SNAT by default when the destination is a public IP – which is exactly what drives the port-exhaustion failure mode later. FQDN tags (WindowsUpdate, AzureKubernetesService, etc.) are a convenient shortcut for well-known endpoint sets.

Forced tunneling to on-prem and the management subnet requirement

If compliance demands that internet-bound traffic exit through your on-prem perimeter rather than Azure’s edge, you enable forced tunneling. This is a deploy-time decision and has a hard prerequisite: a second dedicated subnet named exactly AzureFirewallManagementSubnet (also /26) with its own public IP. This carries the firewall’s management plane traffic (health, signature updates, logging) so the control plane stays reachable even after you redirect the data plane’s default route to on-prem.

# Required management subnet for forced tunneling
az network vnet subnet create \
  --resource-group "$RG" --vnet-name "$HUB_VNET" \
  --name AzureFirewallManagementSubnet --address-prefixes 10.0.1.64/26

az network public-ip create \
  --resource-group "$RG" --name pip-azfw-mgmt \
  --sku Standard --allocation-method Static

The firewall must be created with both a data-path IP config and a management IP config (--management-public-ip / management ip-config) to enable forced tunneling – it cannot be added to an existing firewall that lacks it without redeployment. Once enabled, you place a UDR on AzureFirewallSubnet sending 0.0.0.0/0 to your on-prem next hop (the VPN/ER gateway or an on-prem NVA), and crucially you do not put a default route on the management subnet – it must retain its direct Internet route.

Subnet Default route (0.0.0.0/0) Public IP
AzureFirewallSubnet -> on-prem gateway (forced tunnel) data-path PIP (still required)
AzureFirewallManagementSubnet -> Internet (system route, untouched) management PIP
Spoke workload subnets -> firewall private IP none

The single most common forced-tunneling outage: someone associates the spoke’s 0.0.0.0/0 -> firewall route table to the management subnet, or removes the management PIP. The firewall loses its control plane and transitions to a failed state. Keep the management subnet’s routing pristine.

Diagnosing asymmetric routing and SNAT-port exhaustion

Asymmetric routing

Symptoms: connections that “sometimes” work, TCP handshakes that hang, pings succeeding while TCP fails. Azure Firewall is stateful – it only permits return packets belonging to a flow it already tracks. If the return path skips the firewall, those packets are dropped as out-of-state.

Root causes, in order of frequency:

  1. A UDR on one side but not the other (east-west, see Step 3).
  2. A DNAT’d inbound flow whose backend subnet lacks a return UDR.
  3. On-prem advertising a route over ExpressRoute/VPN that pulls the return path around the firewall.

Confirm it with effective routes on the actual NIC (the source of truth – it merges system routes, UDRs, and BGP):

az network nic show-effective-route-table \
  --resource-group "$SPOKE_RG" --name nic-appvm-01 \
  --output table

If both directions don’t resolve to VirtualAppliance at the firewall IP, you have your answer. Network Watcher’s connection troubleshoot / next-hop checks confirm the per-hop decision:

az network watcher show-next-hop \
  --resource-group "$SPOKE_RG" \
  --vm appvm-01 \
  --source-ip 10.1.0.10 \
  --dest-ip 10.2.0.10 \
  --nic nic-appvm-01

SNAT-port exhaustion

When the firewall SNATs outbound traffic to a public destination, every concurrent flow to the same destination IP:port consumes one ephemeral source port per public IP. A single public IP gives roughly 64K ports total, but the practical ceiling per backend destination is far lower (about 1,024 ports per destination endpoint before reuse pressure). High-fanout workloads – think thousands of nodes hammering one API endpoint – exhaust ports and new connections fail with timeouts that masquerade as firewall blocks.

The fixes, in order of leverage:

# Scale SNAT capacity by attaching additional public IPs
az network public-ip create -g "$RG" -n pip-azfw-2 --sku Standard --allocation-method Static
az network firewall ip-config create \
  --resource-group "$RG" --firewall-name afw-hub \
  --name fw-ipconfig-2 --public-ip-address pip-azfw-2

Traffic to private destinations (spoke-to-spoke, on-prem) is not SNAT’d by default, so east-west flows don’t consume SNAT ports. You can also disable SNAT for specified IP ranges (e.g. all RFC 1918) via the policy’s private-range setting when on-prem must see original client IPs.

Centralized logging, IDPS, and validation

Send firewall logs to Log Analytics via a diagnostic setting; resource-specific tables (AZFWNetworkRule, AZFWApplicationRule, AZFWNatRule, AZFWIdpsSignature) are far cheaper to query than the legacy AzureDiagnostics blob:

LAW_ID=$(az monitor log-analytics workspace show \
  -g "$RG" -n law-hub --query id -o tsv)

az monitor diagnostic-settings create \
  --name afw-to-law \
  --resource $(az network firewall show -g "$RG" -n afw-hub --query id -o tsv) \
  --workspace "$LAW_ID" \
  --export-to-resource-specific true \
  --logs '[{"categoryGroup":"allLogs","enabled":true}]'

On Premium, turn on IDPS in the policy for signature-based detection. Run it in Alert mode first to baseline false positives, then move to Alert+Deny once the signal is clean:

az network firewall policy intrusion-detection add \
  --resource-group "$RG" --policy-name afwp-hub \
  --mode Alert

A quick KQL sanity check that flows are actually being inspected and decided:

AZFWApplicationRule
| where TimeGenerated > ago(15m)
| summarize count() by Action, Fqdn, Rule
| order by count_ desc

Enterprise scenario

A payments platform team flipped their entire AKS landing zone to forced tunneling for PCI sign-off. The hub firewall’s AzureFirewallSubnet got the 0.0.0.0/0 -> on-prem ER gateway UDR, on-prem advertised a default route over ExpressRoute, and within an hour every new pod hung on image pulls and the cluster’s egress to the Azure Container Registry timed out. The kicker: it looked like a firewall block, but the firewall logs showed nothing being denied.

The trap was BGP. With a default route learned over ExpressRoute and route propagation enabled on the spoke route tables, the 0.0.0.0/0 from on-prem was overriding the spokes’ carefully written -> firewall UDR for any traffic the explicit route didn’t cover. Effective routes told the whole story instantly:

az network nic show-effective-route-table -g rg-aks -n aks-node-nic-0 -o table
# 0.0.0.0/0  BGP  10.250.0.4 (on-prem)  <-- NOT the firewall, NOT forced-tunnel path

The fix had two parts. First, disable route propagation on the spoke route tables so a learned default can never silently replace the firewall next hop:

az network route-table update -g rg-aks -n rt-aks-egress \
  --disable-bgp-route-propagation true

Second, keep ACR and AKS control-plane egress on the inspected path with an explicit FQDN rule (AzureKubernetesService tag plus *.azurecr.io) rather than relying on the default route at all. After that, every node’s effective route resolved to VirtualAppliance at the firewall, on-prem saw the SNAT’d egress, and PCI got their single audited choke point. Lesson: forced tunneling and ExpressRoute default routes fight over the same 0.0.0.0/0, and BGP wins unless you stop it.

Verify

Run these after every routing change – effective routes are the only truth that matters.

# 1. The spoke NIC's default route points at the firewall, not Internet
az network nic show-effective-route-table \
  -g "$SPOKE_RG" -n nic-appvm-01 -o table
# Expect: 0.0.0.0/0  VirtualAppliance  10.0.1.4  (Active)

# 2. From a spoke VM, an allowed FQDN succeeds and an un-allowed one is blocked
#    (run inside the VM via run-command)
az vm run-command invoke -g "$SPOKE_RG" -n appvm-01 \
  --command-id RunShellScript \
  --scripts "curl -s -o /dev/null -w '%{http_code}' https://www.azure.com; echo; curl -s -m 5 -o /dev/null -w '%{http_code}' https://www.example-not-allowed.com || echo BLOCKED"

# 3. Next hop for an east-west destination resolves to the firewall both ways
az network watcher show-next-hop -g "$SPOKE_RG" --vm appvm-01 \
  --source-ip 10.1.0.10 --dest-ip 10.2.0.10 --nic nic-appvm-01

# 4. Inbound DNAT reaches the backend (from outside)
curl -sv https://$PIP/healthz

In Log Analytics, confirm the allow/deny verdicts line up with intent:

AZFWNetworkRule
| where TimeGenerated > ago(30m)
| project TimeGenerated, SourceIp, DestinationIp, DestinationPort, Action, Rule
| order by TimeGenerated desc

Rollout checklist

Pitfalls and next steps

The recurring theme: Azure Firewall inspects only what your routes send it, and it drops anything whose return path it didn’t see. Most “the firewall is broken” tickets are really one of three things – a missing UDR, an asymmetric return path, or SNAT exhaustion. Validate with effective routes, not by reading your route-table definitions, because BGP from an ExpressRoute/VPN gateway can silently override what you wrote.

From here, layer in Firewall Policy hierarchy (a platform-team base policy inherited by per-landing-zone child policies), centralize a single firewall across a connectivity subscription in an Azure Landing Zone, and codify the entire UDR + policy surface in Terraform so new spokes inherit inspected egress by default rather than by manual association.

AzureFirewallUDRRoutingHub-SpokeSecurity

Comments

Keep Reading