Networking Multi-Cloud

Integrating SD-WAN into a Cloud Backbone: Partner NVAs, Branch Onboarding, and Route Exchange

Most enterprises do not retire their SD-WAN when they go all-in on a cloud backbone. They want the branch overlay and the cloud transit fabric to be one routing domain: a store in Ohio reaching a workload in East US 2 should traverse the SD-WAN overlay to the nearest cloud hub, then ride the provider backbone the rest of the way, with routes learned dynamically and failover handled in the data plane. This article walks the full integration on Azure Virtual WAN with a partner SD-WAN appliance in the hub, covers the equivalent AWS Transit Gateway / Cloud WAN Connect pattern, and is opinionated about the parts that bite: BGP loop avoidance, breakout vs backhaul, and sizing the underlay you actually need.

I will use Arista VeloCloud as the concrete partner. As of mid-2026 this matters: new deployments of VMware SD-WAN in Azure Virtual WAN are blocked at the end of June 2026 — existing deployments keep running, but for anything green-field you deploy the arista-velocloud-sdwan managed application instead. The architecture below is identical across the supported connectivity partners (Barracuda, Cisco Catalyst SD-WAN, Versa, HPE Aruba EdgeConnect).

1. SD-WAN vs traditional WAN: what actually changes

Traditional WAN routes packets per-prefix over a fixed circuit. SD-WAN builds an encrypted overlay of tunnels between edge devices and a controller-orchestrated fabric, then makes forwarding decisions per-application on top of that overlay. Three properties drive the cloud integration design:

The cloud backbone (Azure Virtual WAN, AWS Cloud WAN) is the transit core the overlay plugs into. The win is that branch-to-cloud and branch-to-branch-via-cloud become managed, BGP-driven, and any-to-any, instead of a mesh of point-to-point IPsec tunnels you hand-maintain.

2. Insertion models: NVA in the hub vs native integration

There are two ways to land an SD-WAN overlay into the cloud backbone, and choosing wrong is expensive to undo.

Model What it is When to use
Partner NVA in the hub The SD-WAN vendor’s gateway runs inside the managed virtual hub as a first-class appliance, peering with the hub router. Branches terminate SD-WAN tunnels directly on a cloud-resident gateway; you want vendor parity with on-prem edges and unified orchestration.
Native / indirect (NVA in a spoke VNet) The appliance runs in a normal spoke VNet; it peers with the hub router over BGP or connects via VPN/Connect. You want full control of the VM, a vendor not in the managed program, or a transitional design.

On Azure, the managed-NVA-in-hub model is the clean one: the appliance is Availability Zone aware and automatically deployed highly available, it peers with the Virtual WAN hub router and participates in routing decisions like a Microsoft gateway, and you do not create Site, Site-to-Site connection, or P2S resources for branches — the appliance and its orchestrator own that. You still create hub-to-VNet connections for your workloads.

Two hard constraints to internalize before you build:

A managed NVA requires a Standard virtual hub (not Basic). Licensing is BYOL only — you buy the SD-WAN license from the vendor; Microsoft bills the NVA Infrastructure Units and underlying resources separately.

3. Deploy the partner appliance HA pair and wire it to the hub router

3a. Size the hub and the NVA scale unit

The unit of capacity is the NVA Infrastructure Unit: 1 unit = 500 Mbps of aggregate throughput, and a deployment can range from 2 to 80 units (the vendor publishes which scale points they actually support). That 500 Mbps is raw infrastructure throughput — encryption, encapsulation, and any DPI eat into it, so the effective SD-WAN throughput per unit is vendor-specific and lower. Size to peak, not to best-case test numbers, because the n+1 instance model only protects you when you are under the rated ceiling.

Instance count scales with the chosen unit, which matters for IP planning:

Scale unit Instances deployed
2 - 20 2
30 - 40 3
60 4
80 5

The hub address space drives how many NVA interface IPs you get, and the NVA subnets cannot be resized after creation. Use /23 minimum (required for NVAs with more than two NICs); a /23 gives 11 IPs each for the internal and external NVA subnets, /22 gives 27, /21 gives 59. Pick generously — adding instances or IP configs later consumes from the same fixed pool.

3b. Stand up the hub, then deploy the managed NVA

The hub and Virtual WAN are plain infrastructure-as-code. The NVA itself is a Marketplace managed application, so you deploy it from the Marketplace offer (it creates a customer resource group plus a locked, publisher-controlled managed resource group holding the NetworkVirtualAppliances resource) and then configure it from the vendor’s orchestrator.

resource "azurerm_virtual_wan" "core" {
  name                = "vwan-core"
  resource_group_name = azurerm_resource_group.net.name
  location            = "eastus2"
}

resource "azurerm_virtual_hub" "eus2" {
  name                = "hub-eus2"
  resource_group_name = azurerm_resource_group.net.name
  location            = "eastus2"
  virtual_wan_id      = azurerm_virtual_wan.core.id
  address_prefix      = "10.100.0.0/23"   # /23 minimum for NVA + room to grow
  sku                 = "Standard"        # NVA in hub requires Standard
}

The NVA resource is provisioned by the managed app; the supported, vendor-agnostic way to create it directly is the virtual-appliance API. After deployment you confirm placement and scale:

# Confirm the NVA landed in the right hub and note its instance/scale-unit profile
az network virtual-appliance show \
  --name velocloud-eus2 --resource-group <managed-rg> \
  --query "{hub:virtualHub.id, units:virtualApplianceScaleUnit, vendor:nvaSku.vendor}" -o jsonc

Direct SSH/console access to the appliances is intentionally not available; all SD-WAN config (interfaces, business policy, branch profiles) happens in the VeloCloud Orchestrator, which the managed app bootstraps.

4. Branch onboarding: zero-touch provisioning and tunnel templates

This is where SD-WAN earns its keep. You do not hand-configure tunnels per branch; you template once and let ZTP do the rest.

The flow:

  1. Pre-stage in the Orchestrator. Create the branch as a managed Edge, assign it a Profile (the tunnel template: which overlay links, QoS classes, business policy, and which cloud hub gateways to register with). Generate an Activation Key per Edge.
  2. Bootstrap the credential. The activation key (a one-time token) is the trust anchor. Ship the appliance to the branch; an on-site tech powers it on and either enters the key or scans it. The Edge calls home to the Orchestrator over the internet, authenticates with the key, and is issued its long-lived certificate.
  3. Auto-build tunnels. Once activated, the Edge pulls its profile and automatically establishes overlay tunnels to the in-hub gateway(s) for every cloud hub in scope. No per-branch tunnel config touches the cloud side — the in-hub NVA already advertises itself to the overlay.
  4. Route exchange begins. The branch LAN prefixes flow up the overlay to the in-hub NVA, which redistributes them into BGP toward the hub router (Section 5).

Treat ZTP enrollment as software config, not clicks. A minimal branch profile expressed as data, fed to the Orchestrator API or your config pipeline:

# branch profile: store-ohio-014 (consumed by the SD-WAN orchestrator API)
edge:
  name: store-ohio-014
  profile: retail-branch-standard
  activation_key_ttl_days: 30        # key expires if not used; rotate, never reuse
links:
  - interface: GE3            # broadband primary
    type: public
    bandwidth_mbps_up: 200
  - interface: LTE1           # cellular backup
    type: public
    backup: true
cloud_gateways:
  - hub: hub-eus2             # register overlay to the in-hub NVA
  - hub: hub-scus            # second hub for geo-redundancy
lan_subnets:
  - 10.214.14.0/24           # advertised into the overlay -> hub router via BGP

Operational rule: activation keys are bearer credentials. Scope a short TTL, deliver them out-of-band from the hardware, and revoke on any failed/aborted onboarding. A leaked key plus a spare appliance is an unauthorized tunnel into your backbone.

5. BGP between the SD-WAN overlay and the cloud backbone

The in-hub NVA learns branch prefixes from the overlay and must hand them to the cloud backbone — and learn cloud/VNet prefixes back — over BGP. With the managed-in-hub model this peering is built in: the NVA peers with the hub router automatically. If you instead run the appliance in a spoke VNet, you configure the peering explicitly, and the rules below are where designs go wrong.

Hub-router BGP facts you must respect (Azure):

A spoke-NVA peering in IaC, with the ASN and dual-IP rules applied:

# NVA running in a spoke VNet, peering the spoke router (the hub) over BGP.
resource "azurerm_virtual_hub_bgp_connection" "velo" {
  name           = "velo-overlay-peer"
  virtual_hub_id = azurerm_virtual_hub.eus2.id
  peer_asn       = 65010                       # != hub ASN, not in any reserved range
  peer_ip        = "10.110.0.5"                # NVA interface IP (no loopbacks)

  virtual_network_connection_id =
    azurerm_virtual_hub_connection.velo_spoke.id
}

Avoiding route loops

Loop avoidance here is mostly about understanding what the hub will and will not re-advertise:

6. Traffic steering: breakout vs backhaul

App-aware steering is the reason SD-WAN exists; the integration job is to make the cloud egress policy agree with the overlay’s business policy.

On a secured Virtual hub, you express that backhaul intent with Routing Intent, not hand-built UDRs. Note the dependency: BGP peering between a spoke NVA and a secured hub is supported only when Routing Intent is configured — on a secured hub without it, the peering is not supported.

# Send all internet-bound and private traffic to the hub's security stack.
# Branch SaaS that breaks out locally never enters the hub, so it is unaffected.
az network vhub routing-intent create \
  --name eus2-intent \
  --vhub-name hub-eus2 --resource-group net-rg \
  --routing-policies \
    "[{name:Internet,destinations:[Internet],nextHop:<azfw-or-nva-id>}, \
      {name:PrivateTraffic,destinations:[PrivateTraffic],nextHop:<azfw-or-nva-id>}]"

The principle: keep the two policy planes consistent. If the overlay breaks a SaaS app out locally but the cloud hub’s egress firewall expects to inspect it, you have a coverage gap. Decide per app category in one place and reflect it in both.

7. Capacity and the underlay bandwidth you actually need

Two numbers, do not conflate them:

MTU is the silent killer. The overlay encapsulates (IPsec, often over the provider’s own encap), so a 1500-byte inner packet plus headers exceeds path MTU and fragments or black-holes large flows. On AWS Connect the math is explicit — set the GRE tunnel MTU to external MTU minus 24 bytes (4-byte GRE + 20-byte outer IP): 1500 - 24 = 1476. On any overlay, leave room and enable TCP MSS clamping at the edge so handshakes negotiate a safe segment size instead of relying on PMTUD through tunnels that often drop ICMP.

The AWS equivalent: Transit Gateway / Cloud WAN Connect

If your backbone is AWS, the analogous construct is a Connect attachment on a transit/transport attachment, with GRE tunnels (Connect peers) to the SD-WAN appliance and BGP for routing. The specifics worth knowing:

# AWS: GRE Connect peer to an SD-WAN appliance, dual BGP sessions implied.
aws ec2 create-transit-gateway-connect-peer \
  --transit-gateway-attachment-id tgw-attach-0abc123 \
  --peer-address 10.20.0.10 \
  --inside-cidr-blocks 169.254.6.0/29 \
  --bgp-options PeerAsn=65010
# Appliance side: GRE tunnel MTU = 1500 - 24 = 1476, MSS clamp on, BGP from 169.254.6.1

Enterprise scenario

A retail platform team migrated 240 stores from MPLS to VeloCloud SD-WAN with two Azure Virtual WAN hubs (East US 2 primary, South Central US secondary) and the appliance in the hub. POS, inventory, and back-office traffic backhauled through the hubs to an inspected egress; Microsoft 365 broke out locally at each store. They advertised every store LAN as an individual /24 from the in-hub NVA into the hub router — 240 prefixes, plus their VNet and ExpressRoute routes. It worked in the pilot of 30 stores and passed UAT.

The constraint hit during national rollout. As store count climbed and a few sites advertised secondary subnets, the prefix count from the overlay marched toward the 4,000-route NVA limit while the hub’s total crept toward the 10,000-route ceiling as VNets and ExpressRoute prefixes piled on. More damaging operationally: every store add or subnet change re-advertised a fresh /24, and hub-side reconvergence across hundreds of discrete prefixes during the nightly maintenance window produced 30-60 second windows where some stores saw POS timeouts. The team had also left routing_weight defaults such that during a hub failover, return traffic for a subset of prefixes came back over the secondary hub while the overlay still preferred primary — classic asymmetry, and the inspecting firewall dropped those flows.

The fix was structural. First, summarize: VeloCloud was configured to advertise a single regional supernet per geography (10.214.0.0/16, 10.220.0.0/16, …) from the in-hub NVA instead of per-store /24s, collapsing 240+ prefixes to a handful and dropping hub reconvergence to sub-second on store changes. Branch-specific reachability stayed intact because the overlay itself still carried the specific routes between Edges; only the cloud-facing advertisement was aggregated. Second, they made hub preference deterministic and symmetric so failover egress and the overlay’s preferred hub always agreed. The corrected, aggregated peering:

resource "azurerm_virtual_hub_bgp_connection" "velo_eus2" {
  name           = "velo-overlay-peer"
  virtual_hub_id = azurerm_virtual_hub.eus2.id
  peer_asn       = 65010      # overlay ASN, distinct from hub, not reserved
  peer_ip        = "10.110.0.5"

  virtual_network_connection_id =
    azurerm_virtual_hub_connection.velo_spoke.id
}
# NVA business policy advertises regional supernets only:
#   10.214.0.0/16, 10.220.0.0/16 ...  -> well under the 4,000-route NVA cap,
#   stable through store adds, fast hub reconvergence on failure.

They added a synthetic check that counts prefixes received by the hub from the overlay and alarms at 70% of the 4,000 limit, plus a per-region return-path probe that fails if traffic egresses the non-preferred hub. The route-scale class of incident became a dashboard threshold instead of a 2 a.m. page.

Verify

Confirm the integration end to end before you onboard real branches.

az network vhub bgpconnection list \
  --vhub-name hub-eus2 --resource-group net-rg -o table

# Effective routes the hub programmed: branch supernets should show next hop = the BGP peer
az network vhub get-effective-routes \
  --name hub-eus2 --resource-group net-rg \
  --resource-type RouteTable \
  --resource-id <defaultRouteTable-id> -o table

Failover drill checklist

SD-WANHybridVirtual WANBGPBranchNetworking

Comments

Keep Reading