Networking Azure

ExpressRoute Deep Dive: Private Peering, Route Filters, and VPN Failover

ExpressRoute gives you a private, predictable path into Azure that never touches the public internet, but the resiliency story only holds together if BGP is configured deliberately and there is a tested backup path waiting. This guide provisions a circuit end to end, stands up private peering with the mandatory redundant BGP pair, controls advertised routes, enables FastPath, and then wires a Site-to-Site VPN as a true backup using connection weight and AS-path prepending so failover is automatic and measurable.

ExpressRoute building blocks

Three concepts get conflated constantly, so pin them down before touching the portal:

Term What it is
Circuit The logical Azure resource representing your connectivity through a provider. It has a service key (s-key), a bandwidth, a SKU, and a billing model. It is not a physical cable.
Peering location The physical facility (a colocation/meet-me room, e.g. “Silicon Valley”, “London2”) where Microsoft’s edge (MSEE) and your provider interconnect. This is distinct from the Azure region.
Peering A BGP routing domain over the circuit. Private peering carries traffic to your VNets (RFC 1918 and beyond). Microsoft peering carries traffic to Microsoft public services (Microsoft 365, public PaaS endpoints) over public IPs you own.

A circuit always has a primary and secondary connection to two MSEEs for redundancy; this is not optional and not something you turn off. Two BGP sessions come up per peering, one to each MSEE. Your job is to terminate both and let BGP do its work.

Microsoft peering is for Microsoft public endpoints, not “the internet.” It does not give you general internet egress and it does not replace a default route. Most enterprises only need private peering plus Private Link for PaaS; reach for Microsoft peering only when you have a concrete need like Microsoft 365 over ExpressRoute.

Step 1 - Provision the circuit and complete the provider handoff

Create the circuit first. The SKU family (MeteredData vs UnlimitedData) is your billing model; the SKU tier (Standard vs Premium) controls reach and route limits. Premium raises the private-peering route limit from 4,000 to 10,000 and allows connectivity to VNets in any geopolitical region.

az network express-route create \
  --name erc-prod-svc \
  --resource-group rg-connectivity \
  --location eastus2 \
  --provider "Equinix" \
  --peering-location "Silicon Valley" \
  --bandwidth 1000 \
  --sku-family MeteredData \
  --sku-tier Standard

The circuit is born in Provider status: NotProvisioned. Retrieve the service key and hand it to your provider; they use it to map your physical cross-connect to this logical circuit.

az network express-route show \
  --name erc-prod-svc \
  --resource-group rg-connectivity \
  --query "{key:serviceKey, provider:serviceProviderProvisioningState, circuit:circuitProvisioningState}" \
  -o table

Do not configure peering until the provider flips the state. Poll until you see ProviderProvisioningState: Provisioned:

az network express-route show \
  --name erc-prod-svc \
  --resource-group rg-connectivity \
  --query serviceProviderProvisioningState -o tsv

Billing starts when the circuit moves to Provisioned, not when you send your first packet. If the provider provisions ahead of your readiness, you are paying. Sequence the provider order to land close to when you can actually configure peering.

Step 2 - Configure private peering and the BGP pair

Private peering needs two /30 (or a single /29 split by Azure) point-to-point subnets, one for the primary link, one for the secondary. Azure takes the first usable address in each; your edge router takes the second. You also supply your peer ASN (2-byte or 4-byte; private ASNs in the 64512-65514 range are fine on your side) and, in production, an MD5 hash to authenticate the session.

az network express-route peering create \
  --circuit-name erc-prod-svc \
  --resource-group rg-connectivity \
  --peering-type AzurePrivatePeering \
  --peer-asn 65010 \
  --primary-peer-subnet 192.168.10.0/30 \
  --secondary-peer-subnet 192.168.10.4/30 \
  --vlan-id 100 \
  --shared-key "<md5-secret>"

The VLAN ID is the C-tag (inner dot1q) for this peering and must match what your provider configures on the cross-connect. Microsoft peering, if used, takes a different VLAN.

On your edge router (the example below is Cisco IOS-style; translate to your platform), you bring up two eBGP sessions. Microsoft’s MSEE always uses ASN 12076.

! Primary link
interface GigabitEthernet0/1.100
 encapsulation dot1Q 100
 ip address 192.168.10.2 255.255.255.252
!
router bgp 65010
 neighbor 192.168.10.1 remote-as 12076
 neighbor 192.168.10.1 password <md5-secret>
 address-family ipv4
  neighbor 192.168.10.1 activate
  network 10.50.0.0 mask 255.255.0.0
!
! Secondary link mirrors the above on 192.168.10.6 -> 192.168.10.5

Both sessions should reach Established. From Azure, confirm ARP and BGP health per link:

az network express-route list-arp-tables \
  --resource-group rg-connectivity \
  --name erc-prod-svc \
  --path primary --peering-name AzurePrivatePeering -o table

az network express-route list-route-tables \
  --resource-group rg-connectivity \
  --name erc-prod-svc \
  --path primary --peering-name AzurePrivatePeering -o table

Run the same with --path secondary. If only one side is up, you have a redundant circuit running non-redundant - treat it as a P1.

To actually reach VNets, the circuit must connect to an ExpressRoute gateway (an ErGw1Az/ErGw2Az/ErGw3Az SKU VNet gateway, not a VPN gateway) via a connection:

az network vpn-connection create \
  --name cn-er-hub \
  --resource-group rg-connectivity \
  --vnet-gateway1 ergw-hub \
  --express-route-circuit2 erc-prod-svc \
  --routing-weight 0

Note --routing-weight 0 here - this is the ExpressRoute connection weight, and it matters in Step 6.

Step 3 - Control routes with route filters and advertised prefixes

For private peering, you control what Azure learns by what your routers advertise (the network/redistribution statements above), and you control what your routers accept with inbound BGP route maps on your side. Azure advertises your VNet address spaces plus any routes published from connected VNets. Keep the advertised set tight: every prefix counts against the route limit (Step 8), and a sloppy redistribution can blackhole traffic.

Route filters are a separate mechanism that applies to Microsoft peering only. Microsoft peering would otherwise advertise the full set of Microsoft public prefixes (all regions, all services). A route filter is an allow-list of BGP community values that scopes this down to the services and regions you actually consume.

az network route-filter create \
  --name rf-m365 \
  --resource-group rg-connectivity \
  --location eastus2

# Allow only specific service communities (e.g. Exchange, SharePoint, a region)
az network route-filter rule create \
  --resource-group rg-connectivity \
  --filter-name rf-m365 \
  --name allow-services \
  --access Allow \
  --communities 12076:5010 12076:5020 \
  --route-filter-rule-type Community

Then attach the filter to the Microsoft peering. Without an attached route filter, a Microsoft peering advertises nothing - the filter is what activates prefix advertisement.

az network express-route peering update \
  --circuit-name erc-prod-svc \
  --resource-group rg-connectivity \
  --name MicrosoftPeering \
  --route-filter rf-m365

Get the BGP community values from Microsoft’s published service/region community list rather than guessing. The communities 12076:5010 etc. above are placeholders for the pattern; the real values per service and region are documented and change as services are added.

Step 4 - FastPath for the gateway-bypass data path

The ExpressRoute gateway normally sits in the data path and is a throughput and latency bottleneck - it caps at the gateway SKU’s bandwidth and adds a hop. FastPath changes this: it programs the MSEE to send data-plane traffic directly to VMs in the VNet, bypassing the gateway entirely. Control-plane (BGP, route exchange) still flows through the gateway; only the packets skip it.

Requirements worth internalizing:

Enable it on the connection:

az network vpn-connection update \
  --name cn-er-hub \
  --resource-group rg-connectivity \
  --set expressRouteGatewayBypass=true

For latency-sensitive workloads (trading, real-time media, chatty database replication) FastPath is the difference between “ExpressRoute is fast” and “ExpressRoute is fast and consistent,” because you remove the gateway’s variable queuing from every flow.

Designing VPN-as-backup: connection weight and AS-path prepend

ExpressRoute has no SLA value if a single fiber cut takes you offline. The supported, cost-effective backup is a Site-to-Site VPN over the internet terminating on a VPN gateway, coexisting with the ExpressRoute gateway. Microsoft supports ExpressRoute and VPN coexistence; the design goal is: ExpressRoute is always preferred, VPN carries traffic only when ExpressRoute is down.

Two independent knobs make that happen, one per direction:

Azure-to-on-prem (return path). Azure chooses between the two gateways using the connection routing weight. Higher weight wins. Set the ExpressRoute connection to a higher weight than the VPN connection.

# Prefer ExpressRoute outbound from Azure
az network vpn-connection update --name cn-er-hub \
  --resource-group rg-connectivity --set routingWeight=20000

az network vpn-connection update --name cn-vpn-backup \
  --resource-group rg-connectivity --set routingWeight=100

On-prem-to-Azure (forward path). Your on-prem routers decide. Azure advertises the same VNet prefixes over both ExpressRoute private peering and the VPN BGP session, so you must make the VPN-learned routes less attractive. The clean, vendor-neutral lever is AS-path prepending on the VPN session: Azure (or you, on the VPN tunnel’s BGP config) prepends extra ASNs so the path looks longer and BGP prefers the shorter ExpressRoute path.

! On the VPN tunnel BGP neighbor, make inbound VPN routes less preferred.
! Option A: local-preference (preferred within your AS, deterministic)
route-map FROM_AZURE_VPN permit 10
 set local-preference 80
router bgp 65010
 neighbor <azure-vpn-bgp-peer-ip> route-map FROM_AZURE_VPN in
!
! ExpressRoute-learned routes keep the default local-pref 100 and win.

Local-preference is the more reliable lever on your own side because it is evaluated before AS-path in the BGP decision process and is not affected by what Azure does upstream. Use AS-path prepend (configurable on the Azure VPN gateway’s BGP settings) when you cannot touch on-prem policy, but prefer local-preference when you control the edge.

Do not advertise a 0.0.0.0/0 default over the VPN as a lazy backup. If a default route leaks, it competes with ExpressRoute’s specific routes only when ExpressRoute withdraws - which sounds fine - but it also risks pulling internet-bound traffic into the tunnel. Advertise the same specific VNet prefixes over both paths and let longest-prefix-match plus your weight/local-pref policy do the selection.

Verify

Confirm steady state (ExpressRoute carrying traffic) and that the backup is healthy but idle.

# 1. Both BGP sessions on private peering are advertising/receiving routes
az network express-route list-route-tables \
  --resource-group rg-connectivity --name erc-prod-svc \
  --path primary --peering-name AzurePrivatePeering -o table

# 2. Effective routes on a test NIC show next hop = ExpressRoute, not VPN
az network nic show-effective-route-table \
  --name nic-probe01 --resource-group rg-workload -o table

In the effective route table, your on-prem prefixes should appear with Next Hop Type: ExpressRouteGateway. If you see VirtualNetworkGateway (the VPN) while ExpressRoute is up, your weight/local-pref policy is wrong.

From an on-prem host, trace the path and confirm it does not traverse the VPN concentrator’s public IP:

# Linux on-prem host
mtr -rwzbc 20 10.50.4.10
# Confirm low, stable latency and a path through your ER edge router

Validate failover by administratively shutting the ExpressRoute BGP sessions (do this in a window):

! On the on-prem edge router
router bgp 65010
 neighbor 192.168.10.1 shutdown
 neighbor 192.168.10.5 shutdown

Re-run the effective-route-table check; the same prefixes should now show Next Hop Type: VirtualNetworkGateway. Convergence is BGP-driven: with default timers, expect failover on the order of tens of seconds to ~3 minutes depending on hold timers and whether BFD is in play. Measure it with a continuous ping during the cutover and record the gap. Restore by removing the shutdown.

Resiliency and monitoring checklist

Monitoring and the 4,000-route limit

Private peering accepts a maximum of 4,000 IPv4 routes on Standard and 10,000 on Premium. Blow past it and the BGP session drops - not gracefully degrades, drops - taking down that whole peering. The usual cause is redistributing your entire on-prem routing table into BGP instead of summarized supernets. Always advertise summaries (e.g. one 10.50.0.0/16 rather than 256 /24s) and watch the count.

Track route counts and session health with the circuit’s metrics and ARP/route tables on a schedule. Wire alerts on the platform metrics that actually predict an outage:

# Alert if a BGP session on the circuit goes unavailable
az monitor metrics alert create \
  --name alrt-er-bgp-down \
  --resource-group rg-connectivity \
  --scopes "$(az network express-route show -g rg-connectivity -n erc-prod-svc --query id -o tsv)" \
  --condition "avg BgpAvailability < 100" \
  --window-size 5m --evaluation-frequency 1m \
  --description "ExpressRoute private peering BGP availability dropped"

Add Connection Monitor (Network Watcher) end-to-end probes from on-prem to an Azure VM over the circuit so you alert on real reachability and latency, not just control-plane state. Graph BitsInPerSecond/BitsOutPerSecond against your bandwidth to catch saturation before users do.

The single most common ExpressRoute outage I have root-caused is a non-redundant “redundant” circuit: one BGP session quietly down for weeks, then the surviving MSEE goes into maintenance and everything fails at once. Alert on per-session availability, not aggregate circuit state, and your future self will thank you.

Enterprise scenario

A payments platform we ran had ExpressRoute private peering plus a Site-to-Site VPN backup, weights set correctly, and a quarterly failover drill that always passed. Then a real fiber cut hit the peering location at 02:00. ExpressRoute withdrew, VPN took over - and throughput collapsed. Batch settlement that ran in 40 minutes on ExpressRoute was projected at 6+ hours over the tunnel. The team had validated routing failover but never capacity failover.

Root cause was two-fold. First, the VPN gateway was a VpnGw1 (650 Mbps aggregate) backing a 1 Gbps circuit that regularly ran at 700 Mbps - the backup was structurally undersized. Second, a single IPsec tunnel is capped around 1.25 Gbps and, more importantly, a single TCP flow rides one tunnel; the chatty replication never spread across tunnels.

The fix was to right-size the backup as a first-class path, not a token one. They moved to VpnGw3 with active-active mode (two gateway instances, two public IPs, two tunnels to on-prem) so flows hash across tunnels, and validated with iperf during a window, not just ping.

az network vnet-gateway update \
  --name vpngw-backup --resource-group rg-connectivity \
  --sku VpnGw3 --set activeActive=true

The lesson that went into every design review afterward: a backup path you have only tested for reachability is not tested. Drive production-representative load across it - ER weight/local-pref still keeps it idle in steady state, so the drill costs nothing but an iperf run.

Next steps

AzureExpressRouteBGPHybridRoutingResiliency

Comments

Keep Reading