A Site-to-Site VPN to Azure that survives a tunnel drop without a human in the loop is not a checkbox — it is an active-active gateway, two tunnels, and BGP doing the route arithmetic. This guide builds that end to end with Terraform, hardens the crypto so you are not running 2014 defaults, and shows exactly how to prove failover with a real reconvergence test.
Static routing vs BGP: why dynamic routing is non-negotiable for HA
A static-route S2S connection carries a hard-coded list of on-prem prefixes on the local network gateway. It works for a single tunnel. It falls apart the moment you want high availability, because static routes have no concept of liveness. If the active tunnel dies, Azure keeps the static route pointing at a dead path until something — a person, a script — intervenes. There is no automatic next-best path.
BGP changes the model entirely. Instead of declaring prefixes statically, both sides advertise their routes over the tunnel and withdraw them when the session drops. With an active-active gateway you get two independent IPsec tunnels (one per gateway instance), each carrying its own BGP session. When a tunnel flaps, its BGP session times out, those routes are withdrawn, and traffic shifts to the surviving tunnel’s advertised paths automatically. No NSG edit, no portal click.
| Concern | Static routing | BGP |
|---|---|---|
| Failover | Manual / scripted | Automatic on session loss |
| On-prem prefix changes | Edit local network gateway, re-apply | Advertised dynamically, no Azure change |
| Active-active gateway | Limited value | The whole point |
| Transit / multi-site | Painful | Native path selection |
Rule of thumb: if you have an active-active gateway and are not running BGP, you have paid for two instances and one of them is doing nothing useful for failover. The two halves only become an HA pair when BGP can withdraw the dead path.
Topology: active-active gateway, dual tunnels, and ASN/APIPA planning
An active-active gateway gets two public IPs and two BGP peer addresses, one per instance. Your on-prem device builds two tunnels — one to each Azure public IP — and forms a BGP session over each. The result is a full mesh of two tunnels carrying two BGP sessions.
on-prem VPN device (ASN 65010)
| tunnel 1 -> Azure GW instance 0 (PIP-1, BGP peer A)
| tunnel 2 -> Azure GW instance 1 (PIP-2, BGP peer B)
Azure active-active VNet gateway (ASN 65515)
Two planning decisions to nail before you touch any resource:
ASN selection. Azure VPN gateways default to ASN 65515. Your on-prem side needs a different ASN. Use a private ASN — the 16-bit private range is 64512–65534, or the 32-bit private range 4200000000–4294967294. Azure also reserves a handful of ASNs you cannot use on-prem (notably 65515, 65517, 65518, 65519, 65520). Pick something clean like 65010 for on-prem and leave Azure on 65515 unless you have a reason to change it.
BGP peering addresses (APIPA). By default Azure derives the BGP peer IP from the GatewaySubnet range, which is fine if your on-prem device peers from a real address. But many devices — and effectively all AWS/GCP-style and a lot of appliance configs — require APIPA (link-local 169.254.x.x) BGP addresses for S2S. If so, you must use Azure’s reserved APIPA range, which is 169.254.21.0 to 169.254.22.255, and you cannot expand it. For an active-active gateway you assign one APIPA address per instance.
A workable APIPA plan:
| Endpoint | BGP address |
|---|---|
| Azure GW instance 0 | 169.254.21.1 |
| Azure GW instance 1 | 169.254.21.5 |
| On-prem peer (for tunnel 1) | 169.254.21.2 |
| On-prem peer (for tunnel 2) | 169.254.21.6 |
Keep each tunnel’s pair in its own little subnet mentally (
.1/.2,.5/.6). Mismatched APIPA peers are the single most common reason a BGP session refuses to come up while the IPsec SA shows connected.
Step 1 - Deploy the active-active VPN gateway and local network gateway
Active-active requires a GatewaySubnet (Azure mandates this exact name) of at least /27, two public IPs, and an SKU that supports active-active and BGP. VpnGw1 and above qualify; the legacy Basic SKU does not support BGP at all. Use a zone-redundant generation 2 SKU like VpnGw2AZ for production.
resource "azurerm_subnet" "gateway" {
name = "GatewaySubnet" # name is mandatory and case-sensitive
resource_group_name = azurerm_resource_group.hub.name
virtual_network_name = azurerm_virtual_network.hub.name
address_prefixes = ["10.0.255.0/27"]
}
resource "azurerm_public_ip" "vpngw" {
count = 2
name = "pip-vpngw-${count.index}"
resource_group_name = azurerm_resource_group.hub.name
location = azurerm_resource_group.hub.location
allocation_method = "Static"
sku = "Standard"
zones = ["1", "2", "3"]
}
Now the gateway. The two ip_configuration blocks plus active_active = true are what make it an HA pair. bgp_enabled = true turns on dynamic routing, and the bgp_settings block assigns the per-instance APIPA addresses via peering_addresses, each tied to its ip_configuration by name.
resource "azurerm_virtual_network_gateway" "this" {
name = "vpngw-hub-prod"
resource_group_name = azurerm_resource_group.hub.name
location = azurerm_resource_group.hub.location
type = "Vpn"
vpn_type = "RouteBased"
sku = "VpnGw2AZ"
active_active = true
bgp_enabled = true
ip_configuration {
name = "vnetGatewayConfig0"
public_ip_address_id = azurerm_public_ip.vpngw[0].id
private_ip_address_allocation = "Dynamic"
subnet_id = azurerm_subnet.gateway.id
}
ip_configuration {
name = "vnetGatewayConfig1"
public_ip_address_id = azurerm_public_ip.vpngw[1].id
private_ip_address_allocation = "Dynamic"
subnet_id = azurerm_subnet.gateway.id
}
bgp_settings {
asn = 65515 # Azure side ASN (default)
peering_addresses {
ip_configuration_name = "vnetGatewayConfig0"
apipa_addresses = ["169.254.21.1"]
}
peering_addresses {
ip_configuration_name = "vnetGatewayConfig1"
apipa_addresses = ["169.254.21.5"]
}
}
}
Gateway creation takes 30-45 minutes. Plan for it. Run this early and let it bake while you stage the rest of the config. The provider will sit on the apply the whole time.
The local network gateway represents your on-prem side. With BGP, you do not list on-prem prefixes here — you set the on-prem ASN and BGP peer address and let routing do the rest. You need a representation of the on-prem peer; with APIPA peering the bgp_settings on the local network gateway carries the on-prem APIPA peer IP.
resource "azurerm_local_network_gateway" "onprem" {
name = "lng-onprem-dc1"
resource_group_name = azurerm_resource_group.hub.name
location = azurerm_resource_group.hub.location
# Public IP of the on-prem VPN device (the outside interface)
gateway_address = "203.0.113.10"
# With BGP + APIPA you do not enumerate prefixes here; advertise via BGP.
# address_space is omitted intentionally for a pure-BGP design.
bgp_settings {
asn = 65010 # on-prem ASN, must differ from Azure
bgp_peering_address = "169.254.21.2" # on-prem APIPA peer for tunnel 1
}
}
For a fully redundant on-prem edge with two devices you would create a second local network gateway (e.g. lng-onprem-dc2) with the second device’s public IP and 169.254.21.6. For a single on-prem device terminating both tunnels, the second tunnel’s APIPA pairing is expressed on the connection in Step 3.
Step 2 - Configure BGP peering and advertising on-prem prefixes
There is nothing extra to “turn on” for advertising once BGP is enabled — Azure automatically advertises the VNet address space (and, in hub-spoke with useRemoteGateways, peered spoke ranges) to your on-prem peer. Your job is to control what on-prem advertises back.
A clean production stance:
- On-prem advertises its summarized corporate prefixes (for example
192.168.0.0/16, or tighter192.168.10.0/24per site) over both tunnels. - Azure advertises the hub VNet plus spokes. Confirm the spoke peerings have gateway transit enabled so spoke ranges propagate.
- Prefer route summarization on-prem so you advertise a handful of aggregates, not hundreds of /24s. Azure VPN gateways enforce a route limit, and a sprawling table is a flap waiting to happen.
You will validate the learned and advertised routes in the Verify section using az network vnet-gateway list-learned-routes and list-advertised-routes. There is no Azure-side knob to add custom advertised prefixes on a basic VPN gateway beyond what the VNet/peering topology defines — control the advertisement story on the on-prem side and through your peering design.
If you need to advertise a summary route to on-prem that does not match a VNet range (a common need when fronting Azure with custom aggregates), that is a Route Server / NVA pattern, not a stock VPN-gateway feature. Do not expect the VPN gateway to synthesize arbitrary aggregates.
Step 3 - Harden the connection with a custom IPsec/IKE policy (no defaults)
By default Azure negotiates from a broad list of IKE/IPsec proposals that still includes weak options (DES, SHA1, DH Group 2). For anything production or regulated, pin a single strong policy on the connection with an ipsec_policy block. When you specify one, Azure stops offering the default set and proposes only what you declare — so the on-prem device must match it exactly.
A strong, broadly interoperable AES-GCM policy:
resource "azurerm_virtual_network_gateway_connection" "tunnel1" {
name = "cn-onprem-dc1"
resource_group_name = azurerm_resource_group.hub.name
location = azurerm_resource_group.hub.location
type = "IPsec"
virtual_network_gateway_id = azurerm_virtual_network_gateway.this.id
local_network_gateway_id = azurerm_local_network_gateway.onprem.id
connection_protocol = "IKEv2"
bgp_enabled = true
shared_key = var.vpn_shared_key # pull from Key Vault, never hardcode
dpd_timeout_seconds = 45
ipsec_policy {
# IKE Phase 1
ike_encryption = "GCMAES256"
ike_integrity = "GCMAES256" # with GCMAES encryption, integrity must match
dh_group = "DHGroup14"
# IPsec Phase 2 (ESP)
ipsec_encryption = "GCMAES256"
ipsec_integrity = "GCMAES256"
pfs_group = "PFS14" # enable Perfect Forward Secrecy
sa_lifetime = 3600 # seconds; rekey hourly
sa_datasize = 102400000 # KB
}
}
A few accuracy points that bite people:
- GCM integrity pairing. When
ike_encryption/ipsec_encryptionis aGCMAES*algorithm, the matching integrity field must be the sameGCMAES*value. PairingGCMAES256encryption withSHA256integrity is rejected. - PFS is opt-in. Setting
pfs_group = "PFS14"(DH Group 14 for the Phase 2 rekey) is what actually enables Perfect Forward Secrecy.Nonedisables it. - One policy, both ends. A custom policy on the Azure connection means the on-prem device must propose the identical transform set or Phase 1 never completes. Decide the policy once and configure both sides from it.
For the active-active second tunnel terminating on the same on-prem device, create a second connection. The custom_bgp_addresses block lets you bind which Azure-side APIPA address this connection uses for its BGP session — required when one local network gateway pairs with the second gateway instance.
resource "azurerm_virtual_network_gateway_connection" "tunnel2" {
name = "cn-onprem-dc1-t2"
resource_group_name = azurerm_resource_group.hub.name
location = azurerm_resource_group.hub.location
type = "IPsec"
virtual_network_gateway_id = azurerm_virtual_network_gateway.this.id
local_network_gateway_id = azurerm_local_network_gateway.onprem_t2.id
connection_protocol = "IKEv2"
bgp_enabled = true
shared_key = var.vpn_shared_key
dpd_timeout_seconds = 45
# Bind this connection's BGP session to the second instance's APIPA address
custom_bgp_addresses {
primary = "169.254.21.5"
}
ipsec_policy {
ike_encryption = "GCMAES256"
ike_integrity = "GCMAES256"
dh_group = "DHGroup14"
ipsec_encryption = "GCMAES256"
ipsec_integrity = "GCMAES256"
pfs_group = "PFS14"
sa_lifetime = 3600
sa_datasize = 102400000
}
}
Step 4 - On-prem device config: tunnel, BGP, and dead-peer detection
The on-prem side must mirror everything: two tunnels (one per Azure PIP), the exact crypto policy, APIPA BGP peers, and DPD. Vendors differ in syntax, but the values are fixed by what you set in Azure. Here is a representative Cisco IOS-style configuration for one tunnel — replicate it for the second to the other PIP.
! Phase 1 (IKEv2) - must match the Azure ipsec_policy exactly
crypto ikev2 proposal AZURE-P1
encryption aes-gcm-256
prf sha256
group 14
!
! Phase 2 (ESP) - AES-GCM-256, PFS group 14
crypto ipsec transform-set AZURE-P2 esp-gcm 256
mode tunnel
crypto ipsec profile AZURE-PROFILE
set transform-set AZURE-P2
set pfs group14
set security-association lifetime seconds 3600
!
! Tunnel to Azure gateway instance 0 (PIP-1)
interface Tunnel1
ip address 169.254.21.2 255.255.255.255
tunnel source GigabitEthernet0/0
tunnel mode ipsec ipv4
tunnel destination <AZURE_PIP_1>
tunnel protection ipsec profile AZURE-PROFILE
!
! Dead Peer Detection - detect a dead Azure peer quickly
crypto ikev2 dpd 10 3 periodic
!
! BGP: peer to the Azure instance-0 APIPA address, eBGP
router bgp 65010
bgp log-neighbor-changes
neighbor 169.254.21.1 remote-as 65515
neighbor 169.254.21.1 ebgp-multihop 8
neighbor 169.254.21.1 update-source Tunnel1
!
address-family ipv4
network 192.168.10.0 mask 255.255.255.0
neighbor 169.254.21.1 activate
exit-address-family
Three on-prem details that matter:
ebgp-multihop. Azure’s BGP peer is not directly L2-adjacent across the tunnel; you generally needebgp-multihop(a value of 8 is safe) or the eBGP session with a TTL of 1 will fail to establish.- DPD must be on.
crypto ikev2 dpd 10 3 periodicprobes every 10 seconds and tears the SA down after 3 misses. This is what makes failover fast — without DPD, a dead peer lingers until the SA lifetime expires. Match the intent of the Azuredpd_timeout_seconds = 45. - Advertise summaries, activate per neighbor. The
networkstatement injects your on-prem prefix into BGP; keep it summarized. Repeat the whole neighbor block for169.254.21.6 remote-as 65515over Tunnel2.
Verify
Prove both the IPsec layer and the BGP layer independently. A connected tunnel with a dead BGP session is a silent failure waiting to bite you on failover.
Check connection status and that both tunnels are up:
# Both connections should report Connected
az network vpn-connection show \
-g rg-hub-prod -n cn-onprem-dc1 \
--query "{name:name, status:connectionStatus, ingress:ingressBytesTransferred, egress:egressBytesTransferred}" -o table
az network vpn-connection show \
-g rg-hub-prod -n cn-onprem-dc1-t2 \
--query "{name:name, status:connectionStatus}" -o table
Confirm BGP peers are established on the gateway (this is the real HA check):
az network vnet-gateway list-bgp-peer-status \
-g rg-hub-prod -n vpngw-hub-prod \
--query "value[].{peer:neighbor, state:state, asn:asn, routes:routesReceived}" -o table
You want state = Connected for both on-prem APIPA peers and a non-zero routesReceived. Then inspect what each side is exchanging:
# Routes Azure has LEARNED from on-prem (expect your summarized corp prefixes)
az network vnet-gateway list-learned-routes \
-g rg-hub-prod -n vpngw-hub-prod \
--query "value[].{network:network, nextHop:nextHop, asPath:asPath, source:sourcePeer}" -o table
# Routes Azure is ADVERTISING to a specific on-prem peer
az network vnet-gateway list-advertised-routes \
-g rg-hub-prod -n vpngw-hub-prod --peer 169.254.21.2 \
--query "value[].{network:network, nextHop:nextHop}" -o table
On the on-prem device, confirm the mirror image:
show crypto ikev2 sa ! Phase 1 up on both tunnels
show crypto ipsec sa ! Phase 2 SAs installed, encrypting/decrypting
show ip bgp summary ! both neighbors in Established, prefixes received
show ip route bgp ! Azure VNet/spoke prefixes learned via BGP
A healthy deployment shows: both connections Connected, both BGP peers Connected/Established, on-prem corp prefixes in list-learned-routes, Azure VNet ranges in the on-prem BGP table, and live ingress/egress byte counters on both connections.
Testing failover: simulating tunnel loss and observing reconvergence
Do not trust HA you have not broken on purpose. The cleanest non-destructive test is to drop one tunnel and watch BGP reconverge while a continuous ping keeps running.
- Start a continuous ping from an on-prem host to an Azure VM (private IP) and leave it running in another window.
- Tear down tunnel 1 from the on-prem side only — for example
clear crypto sessionon that tunnel interface, or shut the tunnel interface (interface Tunnel1thenshutdown). This avoids touching Azure and mimics a real path failure. - Watch the ping. With BGP + DPD tuned, you should see a small number of dropped packets (single-digit, depending on DPD timers and BGP hold time) and then traffic resumes over tunnel 2.
- Confirm the reconvergence on the Azure side:
# The downed peer should drop to a non-Connected state, the other stays up
az network vnet-gateway list-bgp-peer-status \
-g rg-hub-prod -n vpngw-hub-prod \
--query "value[].{peer:neighbor, state:state, routes:routesReceived}" -o table
# Learned routes should now show next-hop via the surviving peer only
az network vnet-gateway list-learned-routes \
-g rg-hub-prod -n vpngw-hub-prod \
--query "value[?network=='192.168.10.0/24'].{net:network, nextHop:nextHop}" -o table
- Bring tunnel 1 back (
no shutdown), confirm the BGP session re-establishes, and verify the route reappears via both peers.
Failover speed is governed by your slowest detection timer — DPD on the IPsec layer and BGP hold time (default 180s, derived from a 60s keepalive) on the routing layer. If failover feels slow, BGP timers are usually the culprit. Lowering the keepalive/hold timers tightens reconvergence but increases sensitivity to jitter; tune deliberately, do not just slam them to the minimum.
Enterprise scenario
A payments platform ran active-active VPN gateways to two on-prem datacenters and reported “random” 30-90 second outages on failover, far worse than the single-digit packet loss they tested at launch. Their tunnels were healthy; the culprit was BGP path selection. On-prem advertised 192.168.0.0/16 over both tunnels with identical attributes, so Azure load-balanced (ECMP) across instance 0 and instance 1. When instance 0’s tunnel dropped, traffic on that path blackholed until the BGP hold timer (180s default, but they had tuned it to ~90s) expired and withdrew the route. DPD was tearing down IPsec fast, but BGP had not yet pulled the path.
The fix was twofold. First, they let DPD-driven IPsec teardown trigger faster BGP convergence by tightening the on-prem BGP timers to a 5s keepalive / 15s hold instead of relying on the default, accepting the jitter tradeoff on a clean MPLS underlay:
router bgp 65010
neighbor 169.254.21.1 timers 5 15
neighbor 169.254.21.5 timers 5 15
Second — the real win — they switched from symmetric advertisement to deterministic primary/backup using AS-path prepending on the secondary tunnel, so steady-state traffic was not ECMP-split across both instances and only one path had to converge on failure. They also added BFD-style fast detection where the on-prem platform supported it. Post-change failover dropped to under 3 seconds of loss. The lesson: active-active plus ECMP is throughput, not low-RTO HA. If you need deterministic sub-5s failover, make one path primary and let BGP timers, not just DPD, drive reconvergence.
Failover and hardening checklist
Throughput tuning, SKU sizing, and diagnosing tunnel flaps
SKU sizing drives throughput and tunnel count. The VpnGw* SKUs scale aggregate throughput and BGP scale roughly with the tier — VpnGw1 through VpnGw5, with the AZ variants adding zone redundancy. Size on aggregate throughput across all tunnels, not per-tunnel, and remember a single IPsec tunnel will not saturate a large gateway because per-tunnel throughput is capped well below the SKU aggregate. If you need more than one tunnel’s worth of bandwidth to a single site, that is a multi-tunnel or ExpressRoute conversation, not a bigger-VPN-SKU conversation.
Diagnose flaps from the logs, not the portal. Enable diagnostic settings on the gateway and stream the tunnel and route categories to Log Analytics:
az monitor diagnostic-settings create \
--name vpngw-diag \
--resource $(az network vnet-gateway show -g rg-hub-prod -n vpngw-hub-prod --query id -o tsv) \
--workspace $(az monitor log-analytics workspace show -g rg-hub-prod -n law-hub --query id -o tsv) \
--logs '[{"category":"TunnelDiagnosticLog","enabled":true},
{"category":"RouteDiagnosticLog","enabled":true},
{"category":"IKEDiagnosticLog","enabled":true},
{"category":"GatewayDiagnosticLog","enabled":true}]'
Then a tunnel that keeps flapping shows its connect/disconnect events with a reason:
AzureDiagnostics
| where Category == "TunnelDiagnosticLog"
| where TimeGenerated > ago(6h)
| project TimeGenerated, status_s, stateChangeReason_s, remoteIP_s
| order by TimeGenerated desc
A repeating connect/disconnect with stateChangeReason pointing at a policy or peer mismatch is almost always a crypto or APIPA asymmetry between the two sides. IKEDiagnosticLog will show the failed Phase 1 negotiation directly.
Pitfalls that cause production outages
- One side static, one side BGP. Mixing a static local network gateway with a BGP connection produces a tunnel that connects but never exchanges routes correctly. Commit to BGP on both ends.
- APIPA outside the reserved range. Anything outside 169.254.21.0-169.254.22.255 is silently wrong; the IPsec SA comes up and BGP never does.
- GCM integrity mismatch.
GCMAES*encryption demands the sameGCMAES*integrity value. ASHA256integrity with GCM encryption fails Phase 1. - Forgetting
ebgp-multihopon-prem. The eBGP session needs TTL > 1 across the tunnel or it never establishes despite a healthy IPsec layer. - Treating active-active as automatic HA. Without BGP, the second instance does not give you failover. The pairing only becomes HA when BGP can withdraw the dead path.
Build it active-active, pin a single hardened crypto policy on both ends, peer BGP over both tunnels, and then break a tunnel on purpose and watch the ping recover. Hybrid connectivity that has survived a deliberate failure is the only kind you should put production traffic on.