Deterministic Outbound with Azure NAT Gateway: Fixing SNAT Port Exhaustion

The first sign is always the same: a batch job, a webhook fan-out, or a connection-pool-happy microservice throws intermittent connection timeouts to one external API, and nobody can reproduce it locally. The metric that explains it is SNATConnectionCount flatlining at a ceiling and DroppedPackets ticking up — you have run out of SNAT ports. Every VM and pod that talks to the public internet needs source NAT: its private IP and ephemeral port get rewritten to a public IP and port so the return traffic can find its way home. Azure gives you three ways to supply that public IP, and they behave so differently under load that the choice is the difference between a service that scales and one that pages you at 18:00 every day.

Azure NAT Gateway is the correct fix. It is a fully managed, highly available outbound-only resource that you attach to a subnet; from that moment, every flow leaving the subnet is translated through your public IP (or a contiguous public IP prefix you can hand to partners for allow-listing), and SNAT ports are allocated on demand from a large shared pool rather than pre-carved per instance. That single dynamic-allocation property is why it survives the load that exhausts default outbound and Load Balancer outbound rules. This article is the deep, exam-grade, production-real treatment: every outbound path compared, the 5-tuple SNAT model that explains why exhaustion happens, prefix sizing maths, AKS outboundType integration, idle-timeout tuning, and a full symptom→cause→confirm→fix playbook — with az, Bicep, KQL, and a dense reference of tables you can scan mid-incident.

By the end you will stop guessing about egress. When the pager goes off with “intermittent timeouts to the payment provider,” you will know within ninety seconds whether you are looking at SNAT exhaustion against a single destination VIP, an idle-timeout reclaiming long-lived flows, a Basic-SKU resource silently blocking the association, or a zone-redundant cluster pinned to one zone’s NAT Gateway — and you will know the exact az command and metric that confirms it. Deterministic, allow-listable, exhaustion-proof egress is a solved problem once you understand the mechanism; this is how to solve it properly and migrate off the default outbound path without an outage.

What problem this solves

Two distinct production pains converge on this one resource, and most teams meet them in this order.

The first is SNAT port exhaustion. Your workload opens many concurrent outbound connections — a notification service fanning out to a webhook host, a reconciliation job hammering one bank API, a microservice with a misconfigured connection pool — and the platform runs out of translation-table entries for that destination. New connections fail. It surfaces as intermittent 5xx, dependency timeouts, and connection resets, under load and not at rest, which is exactly why it passes every test and dies in production during the busy window. On default outbound the per-host SNAT pool is small and shared; on Load Balancer outbound rules it is a fixed budget you must pre-divide across the backend pool and inevitably get wrong.

The second pain is unpredictable, un-allow-listable egress IPs. A partner — a bank, a SaaS API, a regulator’s endpoint — says “give us the source IPs your traffic comes from and we will allow-list them.” With default outbound access you cannot: the egress IP is Microsoft-owned, shared, and can change. With Load Balancer outbound it is the LB’s frontend IP, which works but couples your egress identity to an inbound resource you may not want. NAT Gateway gives you a stable public IP or a contiguous CIDR prefix that is yours, that you publish once, and that never changes as you scale within the prefix.

Who hits this: any subnet making meaningful outbound calls. It bites hardest on high-fan-out batch and event-driven workloads (webhooks, reconciliation, scrapers), on AKS clusters at high pod density (the default outboundType: loadBalancer inherits LB SNAT limits), and on any enterprise integration where a partner demands a fixed source-IP allow-list. There is a third, quieter motivation: Microsoft is retiring default outbound access for newly created VNets (effective 30 September 2025), so new subnets must have an explicit outbound method anyway — and NAT Gateway is the recommended default. The three problems below frame the whole field:

Problem	What you observe in production	Root mechanism	NAT Gateway’s answer
SNAT exhaustion	Intermittent timeouts/5xx to one upstream under load, fine at rest	Translation-table entries for one 5-tuple destination run out	On-demand ports from a large shared pool (64,512 per public IP)
Unpredictable egress IP	Partner cannot allow-list you; egress IP changes	Default outbound uses a shared, Microsoft-owned IP	Stable, owned public IP / contiguous prefix you publish
Default-outbound retirement	New VNets cannot rely on implicit egress (Sept 2025)	Implicit egress being removed for new VNets	Explicit, recommended outbound method per subnet

Learning objectives

By the end of this article you can:

Compare the three outbound paths (default outbound, Load Balancer outbound rules, NAT Gateway) on SNAT allocation, egress-IP stability, and scale — and justify NAT Gateway as the default.
Explain SNAT port allocation through the 5-tuple translation model, and why exhaustion almost always means many flows to a single destination IP and port.
Size a public IP prefix from your peak concurrent-flow count using ceil(flows / 64,512), and respect the 16-IP-per-NAT-Gateway ceiling.
Provision a NAT Gateway, public IP prefix, and subnet association with both az CLI and Bicep, and reason about the outbound precedence rules (what wins when multiple egress configs coexist).
Integrate NAT Gateway with AKS via managedNATGateway and userAssignedNATGateway outboundType, including the zone-redundant one-NAT-Gateway-per-zone topology.
Tune the TCP idle timeout (4–120 minutes) as a capacity-planning lever and pair it with application keepalives instead of cranking it blindly.
Drive the diagnostic signals — SNATConnectionCount, DroppedPackets, TotalConnectionCount — and confirm or rule out exhaustion with exact az and KQL.

Prerequisites & where this fits

You should already understand the basics of an Azure virtual network: a VNet is an address space, subnets carve it up, and resources land in subnets. Knowing what a public IP and a Standard Load Balancer are will help, as will comfort running az in Cloud Shell, reading JSON output, and recognising the difference between inbound and outbound traffic. If you have ever seen “SNAT” in a metric blade and not known what it meant, you are the target reader.

This sits in the Networking track and is the egress-side companion to the inbound and routing material. The VNet and subnet fundamentals come from Azure VNet Deep Dive: Every Setting. The alternative outbound mechanism — and when you would still reach for it — is covered in Standard Load Balancer Outbound Rules, Cross-Region & HA Ports. For PaaS targets you often want to bypass SNAT entirely, which is Private Endpoint vs Service Endpoint. When egress must be inspected and filtered rather than just translated, that is a different resource — see Azure Firewall: Forced Tunneling & Hub-Spoke Routing. And because SNAT exhaustion shows up on the compute side too, the App Service angle is in Troubleshooting Azure App Service: 502/503, Cold Starts & Restart Loops.

A quick map of where each outbound concern is owned and what it can break, so you call the right person fast:

Layer	What lives here	Who usually owns it	What it can cause
Workload (VM / pod)	Connection pooling, keepalives, retries	App / dev team	Exhaustion via per-request connections
Subnet config	NAT Gateway association, UDRs, NSGs	Platform / network	Wrong egress path, blocked association
NAT Gateway	SNAT pool, idle timeout, attached IPs	Network team	Port ceiling, idle reclaim, zone pinning
Public IP / prefix	The egress identity you publish	Network team	Allow-list churn if not a prefix
Destination (partner)	The single VIP everyone hits	External	The 5-tuple bottleneck that exhausts ports
Monitoring	`SNATConnectionCount`, `DroppedPackets`	Platform / SRE	Blind to exhaustion until users feel it

Core concepts

Five mental models make every later decision obvious.

Source NAT is mandatory for outbound, and somebody must supply the public IP. A private IP cannot appear on the public internet; the return packet would have nowhere to go. So every outbound flow has its source private-IP-and-port rewritten to a public IP and a SNAT port. The only question is which resource provides that public IP and how it allocates the ports — and that is the entire subject of this article.

A SNAT “port” is a translation-table entry keyed on the full 5-tuple. The table is keyed on (source IP, source port, destination IP, destination port, protocol). This is the detail that trips everyone up. You are not limited to 64K total connections — you are limited to 64K connections to the same destination IP and port. A flow to 20.1.2.3:443 and a flow to 20.1.2.4:443 are different destinations and inexpensive relative to two flows to the same 20.1.2.3:443. Exhaustion therefore almost always means many connections to one destination: a single payment gateway, one storage public endpoint, one upstream behind a single VIP.

NAT Gateway allocates ports on demand from a shared pool; the alternatives pre-carve them. With NAT Gateway, an idle VM in the subnet consumes nothing and a busy one bursts — ports are handed out across all instances in the subnet as needed. With Load Balancer outbound rules you must pre-divide a fixed 64K budget across the backend pool before any traffic flows, and a wrong division either starves VMs or caps scale. Dynamic allocation is the single biggest reason NAT Gateway survives load that exhausts the other two.

Each attached public IP contributes 64,512 SNAT ports, and the maths is linear. One IP gives ~64,512 simultaneous flows to a single destination IP:port; two give ~129,024; the cap of 16 IPs per NAT Gateway gives ~1,032,192. Capacity planning is therefore arithmetic, not guesswork: count your peak concurrent flows to the busiest single destination and divide.

A port is freed only when its flow goes idle past the TCP idle timeout. Until then the entry is held. That is why idle-timeout tuning is part of capacity planning, not an afterthought: a 4-minute timeout recycles short-lived flows aggressively (more effective capacity), while 120 minutes keeps quiet long-lived connections alive (fewer surprise resets) but holds ports longer. The durable fix for connections dying mid-idle is application keepalives, not a longer timeout.

The vocabulary in one table

Pin down every moving part before the deep sections. The glossary repeats these for lookup; this is the mental model side by side:

Concept	One-line definition	Where it lives	Why it matters
Source NAT (SNAT)	Rewriting private src IP:port → public IP:port for egress	Performed by the outbound resource	Mandatory for any internet egress
SNAT port	One translation-table entry, keyed on the 5-tuple	The NAT resource’s table	The finite thing you exhaust
5-tuple	(src IP, src port, dst IP, dst port, protocol)	Per flow	Same destination = shared budget
NAT Gateway	Managed outbound-only resource attached to a subnet	Subnet association	The recommended egress path
Public IP prefix	A contiguous CIDR of public IPs	Attached to the NAT Gateway	The allow-listable egress identity
Default outbound access	Implicit egress every VM gets if nothing is configured	Platform behaviour	Being retired Sept 2025; unpredictable IP
Idle timeout	How long an idle flow holds its SNAT port	NAT Gateway property (4–120 min)	Capacity + connection-reset lever
`outboundType`	AKS cluster egress mode	AKS networking profile	Chooses LB vs NAT Gateway for the cluster
`DroppedPackets`	NAT Gateway metric for dropped flows	Platform metrics	The smoking gun for exhaustion
`SNATConnectionCount`	Established outbound connections through the gateway	Platform metrics	The headline capacity number

Three outbound paths, and why only one scales

Azure provides three ways to give an outbound flow a public IP, and they are not equivalent. Default outbound is the implicit egress with no configuration; Load Balancer outbound rules let you allocate ports explicitly off an LB frontend; NAT Gateway is the purpose-built, subnet-attached resource. The headline comparison:

Outbound method	SNAT ports	Egress IP	Allocation model	Recommended?
Default outbound access	Implicit, small, shared per host	Unpredictable, Microsoft-owned, can change	Platform-managed, opaque	No — retired Sept 2025 for new VNets
Load Balancer outbound rules	64,000 per frontend IP, pre-divided across backends	The LB’s public IP(s)	Manual, static per-instance	Only if you already run the LB
NAT Gateway	~64,512 per attached public IP, on demand	Your public IP / prefix, stable	Dynamic, shared across the subnet	Yes — the default choice

The mechanics that actually decide it:

Dimension	Default outbound	LB outbound rules	NAT Gateway
Port allocation	Implicit, small	Static, pre-carved	Dynamic, on demand
Idle VM cost	Holds a small share	Holds its pre-allocated ports	Consumes nothing
Burst behaviour	Exhausts fast	Capped by pre-division	Bursts from shared pool
Egress-IP stability	None (can change)	Stable (LB frontend)	Stable (your prefix)
Allow-listing	Impossible	Possible (LB IP)	Clean (contiguous CIDR)
Max public IPs	n/a	Multiple frontends	16 per NAT GW
Ports per public IP	Tiny shared slice	64,000 to pre-divide	~64,512 on demand
Couples to inbound?	No	Yes (needs an LB)	No (outbound-only)
Provisioning effort	None	Rule + budget maths	Resource + association
SKU requirement	Any	Standard LB	All Standard in subnet
Zone model	Platform	LB zone behaviour	Zonal (1 GW/zone for HA)
Future-proof	No (retiring)	Yes	Yes (recommended)

Three reading notes that save a design review:

If you are…	The trap	Do this
Relying on default outbound today	It works until it doesn’t, and it is being retired	Treat it as tech debt; add an explicit NAT Gateway
Already running a Standard LB	Tempting to reuse its outbound rules	Fine for small scale; add NAT Gateway if you pre-divide ports
Building a new subnet	New VNets lose implicit egress Sept 2025	Provision NAT Gateway from day one

For any subnet making meaningful outbound calls, the answer is NAT Gateway. The rest of this article assumes that decision.

Understanding SNAT port allocation

A SNAT “port” is one entry in the translation table, and the table is keyed on the full 5-tuple: source IP, source port, destination IP, destination port, protocol. You are not limited to 64K total connections; you are limited to 64K connections to the same destination IP and port. Spreading the same load across many destinations rarely exhausts ports — exhaustion is a single-destination phenomenon.

Each public IP attached to a NAT Gateway contributes 64,512 SNAT ports (some docs round to 64,000). The arithmetic is linear:

SNAT ports available = 64,512  ×  (number of attached public IPs)

1 public IP   ->  ~64,512 simultaneous flows to a single dest IP:port
2 public IPs  ->  ~129,024
16 public IPs ->  ~1,032,192   (16 = the per-NAT-GW IP/prefix cap)

How the same workload consumes the budget depending on how it is written:

Connection pattern	Flows to one dest	SNAT pressure	Typical culprit
New `HttpClient` / socket per request	One per request	Severe — scales with RPS	The classic exhaustion bug
Pooled client, no keepalive	One per pool slot, churns on idle	Moderate	Idle reclaim mid-burst
Pooled client + keepalive	Reused, long-lived	Low — flat under load	The intended pattern
Many destinations (sharded)	Spread across 5-tuples	Low	Naturally exhaustion-resistant
Single VIP, high concurrency	All on one 5-tuple	Severe	Payment/bank/storage endpoint
Retry storm on failure	Multiplies new flows	Severe — self-worsening	Aggressive retry-on-timeout
Short-lived TLS handshakes	New flow per call	High under burst	Webhook/notification fan-out
DNS-resolved to rotating IPs	Spread naturally	Low	CDN-fronted upstreams

The port lifecycle is what ties capacity to the idle timeout:

Flow state	Holds a SNAT port?	Freed when…	Lever
Active (sending/receiving)	Yes	Connection closes	App connection reuse
Idle but open	Yes	Idle timeout elapses	Idle-timeout setting
TCP `TIME_WAIT`	Briefly	OS reclaim window	Keepalive / fewer new flows
Closed	No	Immediately	—

A worked sizing example, end to end: at 1,800 requests/second with a new connection per request and a ~4-minute TIME_WAIT, you can have hundreds of thousands of sockets in flight against one destination. That is why it fails instantly under flash-sale load and never in a unit test — and why the first fix is connection reuse, with NAT Gateway sizing as the safety margin behind it.

Provision the NAT Gateway, public IP, and prefix

You need three resources: the NAT Gateway, at least one Standard-SKU public IP (or a public IP prefix), and a subnet association. Prefer a prefix to individual IPs — the contiguous CIDR is exactly what you publish to partners for allow-listing, and you can scale within it without changing what they whitelist.

Azure CLI:

LOC=eastus
RG=rg-egress-prod

# A /28 prefix = 16 contiguous IPs. Pick the size from the sizing section.
az network public-ip prefix create \
  --resource-group $RG \
  --name pip-prefix-natgw \
  --length 28 \
  --location $LOC \
  --version IPv4

# NAT Gateway. Idle timeout in minutes (default 4, max 120).
az network nat gateway create \
  --resource-group $RG \
  --name natgw-prod \
  --location $LOC \
  --public-ip-prefixes pip-prefix-natgw \
  --idle-timeout 10

The same thing in Bicep, which is what you actually want in the repo:

resource pipPrefix 'Microsoft.Network/publicIPPrefixes@2023-11-01' = {
  name: 'pip-prefix-natgw'
  location: location
  sku: { name: 'Standard' }
  properties: {
    prefixLength: 28
    publicIPAddressVersion: 'IPv4'
  }
}

resource natgw 'Microsoft.Network/natGateways@2023-11-01' = {
  name: 'natgw-prod'
  location: location
  sku: { name: 'Standard' }
  properties: {
    idleTimeoutInMinutes: 10
    publicIpPrefixes: [
      { id: pipPrefix.id }
    ]
  }
}

NAT Gateway has exactly one SKU (Standard), so there is no SKU decision to make — only IP count and idle timeout. The complete property surface:

Property	Values	Default	When to change	Trade-off / limit
`sku.name`	`Standard` only	`Standard`	Never (no choice)	Cannot attach to Basic-SKU resources
`idleTimeoutInMinutes`	4–120	4	Long-lived idle flows being reset	Higher holds ports longer
`publicIpAddresses`	0–16 individual IPs	none	Need fixed single IPs	Counts toward the 16-IP cap
`publicIpPrefixes`	prefix(es) totalling ≤16 IPs	none	Allow-listable CIDR (preferred)	Largest single prefix is /28
`zones`	none / `1` / `2` / `3`	none (regional)	Pin egress to a zone for AZ design	One GW cannot span zones
`subnets`	association(s)	none	Attach to workload subnet(s)	A subnet has only one NAT GW

Public IP vs public IP prefix — pick the prefix for anything partners allow-list:

Aspect	Individual public IP(s)	Public IP prefix
Allow-listing	Each IP listed separately	One contiguous CIDR
Scaling egress	Add/remove single IPs	Grow within the prefix
Partner churn	Allow-list changes per IP	Allow-list never changes
Granularity	Exact count	Powers of two (/31…/28)
Best for	One or two fixed IPs	Production egress identity

Zone behaviour you must design around — NAT Gateway is a zonal resource:

Deployment	`--zone`	Resilience	Use when
Non-zonal (regional)	omitted	Single regional resource	Dev / non-HA egress
Zonal	`1`, `2`, or `3`	Pinned to one zone’s fate	Part of a per-zone HA design
Zone-redundant egress	one GW per zone	Survives a zone loss	Production multi-zone workloads

A single NAT Gateway cannot span zones. For zone-redundant egress you deploy one NAT Gateway per zone, each on its own subnet — covered in the AKS section, where it matters most.

Associate to subnets and the precedence rules

A NAT Gateway is attached to a subnet, and once attached it becomes the outbound path for every resource in that subnet. This is where the ordering rules matter, because they decide what wins when multiple egress configs coexist.

az network vnet subnet update \
  --resource-group $RG \
  --vnet-name vnet-prod \
  --name snet-workload \
  --nat-gateway natgw-prod

resource subnet 'Microsoft.Network/virtualNetworks/subnets@2023-11-01' = {
  parent: vnet
  name: 'snet-workload'
  properties: {
    addressPrefix: '10.20.1.0/24'
    natGateway: { id: natgw.id }
  }
}

The precedence Azure applies for outbound, highest to lowest:

Priority	Outbound method	Wins over	Notes
1	NAT Gateway on the subnet	Everything below	Overrides LB outbound rules and instance-level IP for egress
2	Instance-level public IP (on the NIC)	LB rules, default	Used for outbound only if no NAT Gateway
3	Load Balancer outbound rules	Default	Used only if neither above applies
4	Default outbound access	—	Last resort; retiring for new VNets

Two rules that save you a debugging session:

Rule	What it means	Consequence if ignored
NAT Gateway wins outbound even if a NIC has its own public IP	Inbound stays on the NIC IP; outbound goes via NAT GW	Surprised that egress IP “changed” after attaching NAT GW
Everything in the subnet must be Standard SKU	No Basic LB or Basic public IP allowed	Association silently blocked / unsupported

Association scope and limits, enumerated:

Capability	Allowed?	Detail
One NAT GW → many subnets (same VNet)	Yes	Shares the SNAT pool across them
One subnet → many NAT GWs	No	A subnet has exactly one NAT GW
NAT GW → subnets in different VNets	No	Same-VNet only
Subnet contains a Basic-SKU resource	No	Must be all Standard SKU
Attach to a gateway subnet (VPN/ER)	No	Not supported on gateway subnets
Coexist with NSG / UDR on the subnet	Yes	NSG/UDR still apply to the flow
Coexist with a Standard LB (inbound)	Yes	NAT GW handles outbound, LB inbound
Inbound on a NIC’s own public IP	Yes	Inbound stays on the NIC IP
Span availability zones with one GW	No	Zonal; one GW per zone for HA
Reuse one prefix across many GWs	No	A prefix attaches to one GW

Size the prefix to your connection count

This is the capacity-planning step people skip and then page about. Work backwards from the worst-case concurrent flows to a single destination:

required public IPs = ceil( peak_concurrent_flows_to_one_dest / 64,512 )
prefix length       = smallest /N whose host count >= required IPs

The prefix-to-capacity table:

Prefix	IPs	Approx SNAT ports (to one dest:port)	Covers peak flows up to
/31	2	~129,024	~129K
/30	4	~258,048	~258K
/29	8	~516,096	~516K
/28	16	~1,032,192	~1.03M

The hard ceiling: a single NAT Gateway supports a maximum of 16 public IP addresses (individual IPs and prefixes combined). A /28 is the largest single prefix that fits, giving ~1.03M ports. Need more than 16 IPs to one destination? Split the workload across multiple subnets, each with its own NAT Gateway and prefix — the documented scaling pattern, not a workaround.

The limits and fixed numbers worth keeping on one screen:

Limit / quantity	Value	Why it matters
SNAT ports per public IP	~64,512	The divisor in all sizing maths
Max public IPs per NAT Gateway	16	Hard ceiling; ~1.03M ports max
Largest single prefix on one GW	/28 (16 IPs)	The biggest contiguous block per GW
Idle timeout range	4–120 minutes	Capacity vs reset trade-off
Default idle timeout	4 minutes	Aggressive reclaim out of the box
SKU choices	`Standard` only	No SKU decision to make
NAT GWs per subnet	1	A subnet has exactly one
Subnets per NAT GW	Many (same VNet)	Shares the pool across them
Zones a single GW spans	1	Zonal; needs one per zone for HA
Default-outbound retirement (new VNets)	30 Sep 2025	Why new subnets need explicit egress

Worked sizing for three realistic workloads:

Workload	Peak flows to one dest	`ceil(/64,512)`	Prefix to provision	Headroom
Webhook fan-out	~50,000	1 IP	/31 (2 IPs)	~2.5×
Notification service	~140,000	3 IPs	/30 (4 IPs)	~1.8×
Bulk reconciliation	~400,000	7 IPs	/29 (8 IPs)	~1.3×
Multi-tenant scraper	~900,000	14 IPs	/28 (16 IPs)	~1.1×
Multi-zone notification	~120K per zone	2 IPs/zone	/30 per zone	per-zone GW
Beyond /28	> 1.03M	> 16 IPs	Split across subnets	per-subnet GW

Worked example in prose: a notification service holds ~140,000 simultaneous TLS connections to one provider VIP at peak. ceil(140000 / 64512) = 3 IPs. A /30 (4 IPs) covers it with headroom; provision it and hand the partner that 4-address CIDR. Do not provision a /28 “to be safe” — you pay per IP, and you can attach a second prefix later without downtime. Sizing anti-patterns to avoid:

Anti-pattern	Why it hurts	Better
Provision /28 “to be safe”	Pay for 16 IPs you do not use	Size to peak + modest headroom
Size by total connections, not per-destination	Over- or under-provisions wildly	Count flows to the busiest single dest
One giant prefix beyond /28	Exceeds the 16-IP cap	Split across subnets/NAT GWs
Loose individual IPs for a partner	Allow-list churns on scale	Use a contiguous prefix
Ignore idle timeout in the maths	Held ports inflate live count	Tune idle timeout + keepalives

Integrating with AKS for stable, allow-listable egress

AKS defaults its outboundType to loadBalancer, which puts cluster egress behind the Standard LB and inherits its SNAT-allocation headaches at high pod density. Switching to managedNATGateway (Azure provisions and owns the gateway) or userAssignedNATGateway (you bring your own on the node subnet) fixes both the exhaustion and the IP-stability problems in one move.

This must be chosen at cluster creation — outboundType is largely immutable, with only specific migration paths supported — so design it in from day one for any cluster that calls allow-listed external endpoints. The four modes:

`outboundType`	Who owns the NAT GW	Egress IP control	SNAT scaling	Choose when
`loadBalancer` (default)	AKS-managed LB	LB outbound IPs	LB pre-division limits	Low egress concurrency only
`managedNATGateway`	Azure provisions it	Azure-allocated IP(s)	On demand, large pool	You want NAT GW without owning IPs
`userAssignedNATGateway`	You (on node subnet)	Your exact prefix	On demand, large pool	Partner allow-lists your CIDR
`userDefinedRouting`	You (via UDR/firewall)	Firewall/NVA IP	N/A (egress through NVA)	Egress must be inspected/filtered

Managed NAT Gateway (Azure provisions and owns it):

az aks create \
  --resource-group rg-aks-prod \
  --name aks-prod \
  --network-plugin azure \
  --outbound-type managedNATGateway \
  --nat-gateway-managed-outbound-ip-count 2 \
  --nat-gateway-idle-timeout 10 \
  --generate-ssh-keys

User-assigned, when you must control the exact egress prefix (the common enterprise case — partners allow-list your CIDR):

# NAT Gateway + prefix already created and attached to the AKS node subnet,
# then point the cluster at that subnet with a userAssignedNATGateway type.
az aks create \
  --resource-group rg-aks-prod \
  --name aks-prod \
  --network-plugin azure \
  --vnet-subnet-id "$NODE_SUBNET_ID" \
  --outbound-type userAssignedNATGateway \
  --generate-ssh-keys

The AKS-specific knobs and their constraints:

Flag / setting	What it controls	Default	Constraint
`--outbound-type`	Cluster egress mode	`loadBalancer`	Set at create; limited migration paths
`--nat-gateway-managed-outbound-ip-count`	Managed-mode IP count	1	Up to 16; each is 64,512 ports
`--nat-gateway-idle-timeout`	Managed-mode idle timeout	4	4–120 minutes
`--vnet-subnet-id` (user-assigned)	Node subnet with your NAT GW	—	NAT GW must be pre-attached
`--network-plugin`	CNI choice	`azure`/`kubenet`	Egress design independent of plugin

For a zone-redundant cluster the node pools span zones, but a NAT Gateway is single-zone. The correct topology is one node-pool subnet per availability zone, each with its own zonal NAT Gateway and prefix. Pods in zone 1 egress through the zone-1 NAT Gateway, and so on; partners allow-list the union of the zonal prefixes. Do not share one NAT Gateway across a multi-zone node subnet — it pins your egress to a single zone’s fate. The zone-redundant layout:

Zone	Node-pool subnet	Zonal NAT GW	Prefix	Egress IPs published
1	`snet-aks-z1`	`natgw-z1` (`--zone 1`)	`/30`	CIDR-1
2	`snet-aks-z2`	`natgw-z2` (`--zone 2`)	`/30`	CIDR-2
3	`snet-aks-z3`	`natgw-z3` (`--zone 3`)	`/30`	CIDR-3
Partner allow-list	—	—	—	Union of CIDR-1…3

More on the day-two operational side of clusters like this is in AKS Day-Two: Upgrades & Fleet Operations.

Tune TCP idle timeout and watch the metrics

The TCP idle timeout governs how long an idle flow holds its SNAT port before reclaim. Lowering it (minimum 4 minutes) frees ports faster and raises effective capacity. Raising it (up to 120 minutes) keeps quiet but long-lived connections alive — useful for databases or brokers that idle between bursts but should not be torn down.

az network nat gateway update \
  --resource-group $RG \
  --name natgw-prod \
  --idle-timeout 4

The idle-timeout decision, both directions:

Setting	Effect on capacity	Effect on long-lived flows	Pick when
4 min (minimum)	Frees ports fastest	May reset quiet connections	High-churn, short-lived flows
10 min (common default-bump)	Balanced	Tolerates brief idleness	General production
30–60 min	Holds ports longer	Keeps brokers/DBs alive	Bursty long-lived sessions
120 min (maximum)	Holds ports longest	Rarely resets anything	Last resort; prefer keepalives

Do not solve idle-timeout pain by cranking it to 120. The durable fix for connections dying mid-idle is application-level TCP keepalives (or HTTP keep-alive / connection pooling) that send traffic before the timeout. Keepalives reset the idle timer and are far more robust than betting that no flow ever idles longer than your window. Idle timeout vs keepalive, head to head:

Approach	Where it lives	Robustness	Side effect
Raise NAT idle timeout	NAT Gateway property	Bet on no flow idling longer	Holds ports; lowers capacity
App TCP keepalive	Socket options / client	Resets the timer reliably	Tiny keepalive traffic
HTTP keep-alive / pooling	HTTP client config	Reuses connections, fewer flows	Needs correct pool sizing
Both (recommended)	NAT + app	Most robust	Minimal

NAT Gateway emits metrics you should alert on, in Microsoft.Network/natGateways:

Metric	What it measures	Alert when	Why it matters
`SNATConnectionCount`	Established outbound connections	Approaching ceiling	The headline capacity number
`TotalConnectionCount`	Active flows through the gateway	Sustained near limit	Corroborates pressure
`DroppedPackets`	Packets/flows dropped	> 0 sustained	The smoking gun for exhaustion
`PacketCount`	Packets processed	Throughput baseline	Capacity/throughput trend
`ByteCount`	Bytes processed	Egress-cost tracking	Bill driver (per-GB)
`SNATConnectionCount` (by direction)	Inbound vs outbound flows	Skew vs expectation	Confirms it is egress, not return
Datapath availability	Gateway health	Below 100%	Rules out a platform-side fault

A KQL query against the platform metrics (or an Azure Monitor metric alert) to catch exhaustion before users do:

AzureMetrics
| where ResourceProvider == "MICROSOFT.NETWORK"
| where Resource == "NATGW-PROD"
| where MetricName in ("DroppedPackets", "SNATConnectionCount")
| summarize Total = sum(Total) by bin(TimeGenerated, 5m), MetricName
| order by TimeGenerated desc

Which changes are online (no egress interruption) and which force a heavier operation is worth knowing before an incident, because the in-the-moment capacity bumps are all online:

Change	Disruptive?	How	Use during an incident?
Add a public IP to the prefix	No	`az network public-ip prefix` larger / attach IP	Yes — fastest capacity bump
Lower the idle timeout	No	`az network nat gateway update --idle-timeout`	Yes — frees ports faster
Raise the idle timeout	No	same flag	Yes — but prefer keepalives
Attach NAT GW to another subnet	No	subnet `update --nat-gateway`	Yes — extend coverage
Detach NAT GW from a subnet	Egress reverts	subnet `update --remove natGateway`	Cautiously — egress path changes
Shrink/replace the prefix	Yes	recreate prefix	No — plan it
Change AKS `outboundType`	Yes (recreate)	cluster recreate / migration path	No — design up front

Set a metric alert on DroppedPackets > 0 over 5 minutes. Any sustained drops mean you are at the ceiling — add a public IP to the prefix (no downtime) or lower the idle timeout, then re-check. Recommended starting thresholds:

Alert	Metric	Threshold (starting point)	Action on fire
Exhaustion (hard)	`DroppedPackets`	> 0 for 5 min	Add IP to prefix; lower idle timeout
Capacity creep	`SNATConnectionCount`	> 80% of `64,512 × IPs`	Plan a prefix bump
Flow surge	`TotalConnectionCount`	> your modelled peak	Investigate connection reuse
Egress cost	`ByteCount`	> budget for the month	Review chatty callers / data path

The deeper observability story — workbooks, alert routing, action groups — is in Azure Monitor Deep Dive: Every Option.

Architecture at a glance

Follow a single outbound request left to right and the whole design clicks into place. A workload — a VM or, more often, a fleet of AKS pods — sits in a workload subnet and opens a TCP connection to an external API. Because the subnet has a NAT Gateway associated, the platform intercepts that flow at egress and performs source NAT: the pod’s private 10.20.1.x address and ephemeral port are rewritten to one of the public IPs in the attached prefix (say 203.0.113.0/30) and an allocated SNAT port from the 64,512-per-IP pool. The translated packet leaves through the NAT Gateway’s stable public identity and arrives at the destination — which, critically, is usually a single VIP (one payment gateway, one bank API). The partner sees a source IP inside the prefix it already allow-listed, and the return traffic flows back through the same translation.

The diagram makes the failure map explicit. Badge 1 marks the workload tier, where a per-request connection pattern manufactures the exhaustion in the first place — the fix lives in code (pooling and keepalives), not in the network. Badge 2 sits on the NAT Gateway, the resource whose SNATConnectionCount and DroppedPackets you watch and whose idle timeout and attached-IP count you tune. Badge 3 is on the public IP prefix — the allow-listable CIDR that must stay stable as you scale within it. Badge 4 is on the single destination VIP, the 5-tuple bottleneck that turns “lots of connections” into “exhaustion,” because every flow shares one (dst IP, dst port). Badge 5 maps the zone-redundant concern: a NAT Gateway is single-zone, so a multi-zone cluster needs one gateway per zone or its egress pins to a single zone’s fate. Trace those five numbered points and you can localise any egress incident to exactly one of them.

Real-world scenario

A fintech platform team — call them PaySettle — ran a payment-reconciliation service on AKS that, at end-of-day batch, opened tens of thousands of short-lived HTTPS connections to a single acquiring-bank API behind one VIP. The cluster used the default outboundType: loadBalancer. Around 18:00 daily, reconciliation jobs failed with connection timeouts and recovered on their own by 18:30. Nobody could reproduce it off-peak, and three sprints of “add retries” and “increase the timeout” had changed nothing.

The constraint was twofold. The bank required PaySettle to allow-list a fixed set of source IPs, so any fix had to produce a stable, declared egress CIDR — ruling out default outbound entirely. And the LB outbound rules pre-allocated SNAT ports across the node pool; because every connection targeted the same destination IP and port, the 5-tuple table for that one destination hit its ceiling exactly when batch concurrency peaked. The dropped-flow window lined up perfectly with the failures. The incident timeline made the mechanism unmistakable:

Time	Observation	Signal	Interpretation
17:55	Batch ramps up	`TotalConnectionCount` climbing	Concurrency rising toward peak
18:02	First timeouts	`DroppedPackets` > 0	SNAT ceiling hit for the bank VIP
18:10	Job retries storm	More new connections	Retries worsen exhaustion
18:28	Batch tapers	`DroppedPackets` → 0	Ports freed; “self-heals”
Next day	Same window	Identical shape	Deterministic, load-driven

They rebuilt the node pools on subnets fronted by a user-assigned NAT Gateway with a /30 prefix (4 IPs, ~258K ports to that destination, roughly double the observed peak), handed the bank that 4-address CIDR, and dropped the idle timeout to 4 minutes to recycle the short-lived flows aggressively. As the cluster was zone-redundant, they provisioned one zonal NAT Gateway and prefix per zone and gave the bank the union of the three CIDRs. They also fixed the application to reuse a pooled HTTPS client with keepalives, so the live flow count fell even before the extra ports mattered.

# Per zone: dedicated subnet, zonal NAT GW, /30 prefix on the node subnet.
az network public-ip prefix create -g rg-pay -n pip-prefix-z1 --length 30 --location eastus --zone 1
az network nat gateway create -g rg-pay -n natgw-z1 --location eastus \
  --public-ip-prefixes pip-prefix-z1 --idle-timeout 4 --zone 1
az network vnet subnet update -g rg-pay --vnet-name vnet-pay \
  --name snet-aks-z1 --nat-gateway natgw-z1

The 18:00 failures stopped on the first batch after cutover. DroppedPackets has been flat at zero since, and the bank’s allow-list never has to change because all future scaling happens within the published prefixes. Before-and-after, the change was stark:

Metric	Before (LB outbound)	After (NAT GW /30 per zone)
Egress IP	LB frontend, shared	3 stable /30 prefixes
Ports to bank VIP	Pre-divided, capped	~258K per zone, on demand
18:00 failures	Daily, ~30 min	Zero
`DroppedPackets`	Non-zero at peak	Flat at zero
Allow-list churn	Risk on every scale	None (scale within prefix)
Idle timeout	LB default	4 min + app keepalives

Advantages and disadvantages

NAT Gateway is the right default for egress, but it is not free of trade-offs. The explicit two-column view:

Advantages	Disadvantages
On-demand SNAT from a large shared pool (no pre-carving)	Outbound-only — does not handle inbound at all
Stable, allow-listable public IP / prefix	Zonal resource — multi-zone HA needs one GW per zone
Survives load that exhausts default / LB outbound	Adds an hourly + per-GB cost vs free default outbound
Fully managed, highly available, single SKU	16-IP-per-GW cap; beyond that you split subnets
Decouples egress identity from inbound resources	All subnet resources must be Standard SKU
Simple association model (attach to subnet)	No L7 features (no filtering/inspection — that is Azure Firewall)
AKS-native via `outboundType`	`outboundType` largely immutable post-create

When each side matters:

Decision factor	Favours NAT Gateway	Favours an alternative
Need stable egress IP to allow-list	Strongly	—
High concurrency to one destination	Strongly	—
Need to filter/inspect egress	—	Azure Firewall (forced tunneling)
Egress is to Azure PaaS only	Maybe	Private Endpoint (bypasses SNAT)
Already run a Standard LB at small scale	Optional	Reuse LB outbound rules
New VNet (post Sep 2025)	Yes	— (default outbound unavailable)
AKS at high pod density	Strongly	—
Need zone-redundant egress	Yes (one GW/zone)	—
Lowest possible cost, non-prod	—	Default outbound (while it lasts)

For egress that must be inspected rather than merely translated, NAT Gateway is the wrong tool — that is Azure Firewall: Forced Tunneling & Hub-Spoke Routing. For traffic to Azure PaaS that should never touch the public internet at all, prefer Private Endpoint vs Service Endpoint.

Hands-on lab

Provision a NAT Gateway with a prefix, attach it to a subnet, and prove your egress IP is the one you provisioned — free-tier-friendly and fully torn down at the end. Run in Cloud Shell (Bash).

Step 1 — Variables and resource group.

RG=rg-natgw-lab
LOC=centralindia
VNET=vnet-lab
SUBNET=snet-workload
NATGW=natgw-lab
PREFIX=pip-prefix-lab
az group create -n $RG -l $LOC -o table

Step 2 — Create a VNet and a workload subnet.

az network vnet create -g $RG -n $VNET --address-prefix 10.20.0.0/16 \
  --subnet-name $SUBNET --subnet-prefix 10.20.1.0/24 -o table

Expected: a VNet with one subnet on 10.20.1.0/24.

Step 3 — Create a /31 public IP prefix (2 IPs — plenty for a lab).

az network public-ip prefix create -g $RG -n $PREFIX --length 31 --location $LOC --version IPv4 -o table

Expected: a prefix resource showing an ipPrefix like x.x.x.x/31.

Step 4 — Create the NAT Gateway with the prefix and a 4-minute idle timeout.

az network nat gateway create -g $RG -n $NATGW --location $LOC \
  --public-ip-prefixes $PREFIX --idle-timeout 4 -o table

Expected: a NAT Gateway, sku.name = Standard, idleTimeoutInMinutes = 4.

Step 5 — Associate the NAT Gateway to the subnet.

az network vnet subnet update -g $RG --vnet-name $VNET --name $SUBNET --nat-gateway $NATGW -o table

Step 6 — Confirm the association and attached IPs from the control plane.

az network nat gateway show -g $RG -n $NATGW \
  --query "{idle:idleTimeoutInMinutes, prefixes:publicIpPrefixes[].id, subnets:subnets[].id}" -o jsonc

Expected: idle: 4, your prefix listed, and the workload subnet listed under subnets.

Step 7 — Prove egress (optional, needs a VM in the subnet). Deploy a tiny VM into snet-workload, then from inside it the echo services must return an IP from your prefix:

# From a VM/debug pod INSIDE the subnet — should return an IP from your prefix, every time.
curl -s https://ifconfig.me ; echo
curl -s https://api.ipify.org ; echo

Validation checklist. You provisioned a NAT Gateway with an allow-listable prefix, attached it to a subnet, confirmed the association from the control plane, and (optionally) verified the egress IP falls inside your prefix. The steps mapped to what each proves:

Step	What you did	What it proves
3	Create a public IP prefix	The contiguous CIDR you would publish
4	Create NAT GW + idle timeout	Single SKU; idle timeout is the lever
5	Associate to subnet	Egress for the whole subnet now flows through it
6	Show association	The control-plane confirmation path
7	Egress echo from inside	The egress IP is your prefix, not default outbound

Cleanup (avoid lingering charges).

az group delete -n $RG --yes --no-wait

Cost note. A NAT Gateway plus a tiny prefix is a few rupees per hour; an hour of this lab is well under ₹50, and deleting the resource group stops everything. Skip Step 7’s VM and the lab is nearly free.

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First the error strings exhaustion actually produces (it rarely says “SNAT” — it shows up as generic connection failures), then the scannable symptom table, then the expanded reasoning for the entries that bite hardest.

SNAT exhaustion is an outbound failure, so the error surfaces in your application’s client, not in an HTTP status from the platform. The strings to recognise:

Observed error (client side)	What it usually means	How to confirm it is SNAT	First fix
`connection timed out` to one host under load	New flow could not get a port	`DroppedPackets` > 0 at that time	Connection reuse; add IP to prefix
`connection reset by peer` mid-session	Idle flow’s port reclaimed	Resets cluster at the idle-timeout interval	Keepalives; raise idle timeout modestly
`EADDRNOTAVAIL` / “address not available”	OS/port allocation pressure	Many flows to one dst; `SNATConnectionCount` high	Pooled client; fewer concurrent flows
Sporadic TLS handshake failures at peak	New handshakes starved of ports	Failure window == load peak	Size prefix to peak; shard destinations
5xx from your service calling an upstream	Upstream call failed, not the upstream itself	App Insights dependency failures to one target	Fix the dependency client, not the upstream
Intermittent DNS-then-connect failures	Connect phase starved, DNS fine	Connect errors spike, DNS resolves	Reuse connections; add capacity

The symptom-to-fix table:

#	Symptom	Root cause	Confirm (exact cmd / portal path)	Fix
1	Intermittent timeouts/5xx to one upstream under load, fine at rest	SNAT exhaustion on one 5-tuple destination	Metrics: `DroppedPackets` > 0; `SNATConnectionCount` near `64,512×IPs`	Connection reuse + add IP to prefix
2	Egress IP is not the prefix you provisioned	Subnet not associated, or NAT GW precedence not in effect	`az network nat gateway show --query subnets`; `curl ifconfig.me` from inside	Associate the subnet; remove conflicting NIC IP assumptions
3	Association fails / unsupported	A Basic-SKU resource in the subnet	`az network public-ip list` / LB SKU = Basic	Upgrade everything to Standard SKU
4	Long-lived connections reset mid-idle	Idle timeout shorter than the idle gap	NAT GW `idleTimeoutInMinutes`; app logs show resets at the interval	App keepalives; modestly raise idle timeout
5	Exhaustion despite “plenty of ports”	All flows hit one dest IP:port (single 5-tuple)	App Insights `dependencies` by `target`; one host dominates	Shard destinations or reuse connections
6	AKS egress IP still LB, not NAT GW	`outboundType` left at default `loadBalancer`	`az aks show --query networkProfile.outboundType`	Recreate cluster with managed/user-assigned NAT GW
7	Cannot change AKS `outboundType` after create	Property is largely immutable	`az aks update` rejects the change	Plan it at creation; use a supported migration path
8	Multi-zone cluster egress pinned to one zone	One NAT GW shared across a multi-zone subnet	NAT GW `zones`; pods in other zones lose egress on zone loss	One zonal NAT GW + subnet per zone
9	Need more than ~1.03M ports to one dest	Hit the 16-IP-per-NAT-GW cap	Prefix already /28; `SNATConnectionCount` ceiling	Split workload across subnets, each its own NAT GW
10	Egress works but partner blocks you	They allow-listed loose IPs; you scaled and IP changed	Compare current attached IPs vs partner’s list	Use a contiguous prefix; publish the CIDR once
11	UDR sends traffic to a firewall, bypassing NAT GW	Route table overrides the egress path	`az network route-table route list`; effective routes	Decide: NAT GW or forced-tunnel via firewall
12	“Self-healing” failures every peak window	Exhaustion that clears when load drops	`DroppedPackets` rises and falls with load	Size the prefix to peak; fix connection reuse

The expanded form, for the entries that bite hardest:

1. Intermittent timeouts/5xx to one upstream under load, fine at rest. Root cause: SNAT port exhaustion — too many concurrent flows to one destination 5-tuple, usually from per-request connections. Confirm: NAT Gateway metrics show DroppedPackets > 0 during the window and SNATConnectionCount flattening near 64,512 × (attached IPs); correlate with the load window. Fix: Reuse connections (pooled HTTP client + keepalives) first; then add a public IP to the prefix (no downtime) and/or lower the idle timeout. Scaling out instances does not help — NAT Gateway pools ports across the subnet already.

2. Egress IP is not the prefix you provisioned. Root cause: The subnet was never associated, or you are reading a NIC’s instance-level public IP and assuming it is the egress identity. Confirm: az network nat gateway show --query "subnets[].id" should list the workload subnet; curl -s https://api.ipify.org from inside the subnet must return a prefix IP. Fix: Associate the subnet. Remember NAT Gateway wins outbound even when a NIC has its own public IP (which still serves inbound).

3. Association fails or is unsupported. Root cause: Something in the subnet is Basic SKU (a Basic public IP or Basic Load Balancer), which NAT Gateway cannot coexist with. Confirm: az network public-ip list -g $RG --query "[].{name:name, sku:sku.name}"; check any LB’s SKU. Fix: Upgrade every public IP and LB in the subnet to Standard SKU; re-attempt the association.

4. Long-lived connections reset mid-idle. Root cause: The idle timeout is shorter than the connection’s idle gap, so the port is reclaimed and the next packet finds a dead translation. Confirm: Check idleTimeoutInMinutes; application logs show resets clustering at exactly that interval. Fix: Add application keepalives (the robust fix) and, if appropriate, raise the idle timeout modestly — never jump straight to 120.

5. Exhaustion despite “plenty of ports.” Root cause: Every flow targets one destination IP:port, so they all share a single 5-tuple budget — total ports are irrelevant. Confirm: App Insights dependencies | summarize by target shows one host dominating; the port ceiling is per-destination. Fix: Reuse connections to that host (fewer flows), or shard across multiple destination endpoints if the upstream offers them.

6. AKS egress IP is still the Load Balancer, not the NAT Gateway. Root cause: outboundType was left at the default loadBalancer. Confirm: az aks show -g rg-aks-prod -n aks-prod --query "networkProfile.outboundType" -o tsv. Fix: outboundType is set at creation — recreate the cluster (or follow a supported migration path) with managedNATGateway or userAssignedNATGateway.

8. Multi-zone cluster egress pinned to one zone. Root cause: One zonal NAT Gateway is shared across a node subnet that spans zones, so losing that zone takes egress with it. Confirm: The NAT Gateway shows a single zones value while node pools span multiple zones. Fix: Deploy one zonal NAT Gateway and subnet per availability zone; publish the union of the prefixes. (Routing-side egress mysteries — UDRs overriding the path — are diagnosed in Troubleshooting VNet Connectivity: NSG, UDR, Effective Routes & Network Watcher.)

Best practices

Make NAT Gateway the default egress for any subnet with meaningful outbound — default outbound is being retired for new VNets and gives you no stable IP.
Use a public IP prefix, not loose IPs, so partners allow-list one contiguous CIDR and you scale within it without churn.
Size the prefix from ceil(peak_flows_to_one_dest / 64,512), with modest headroom — never provision a /28 “to be safe.”
Fix connection reuse in the application first. Pooled clients with keepalives prevent most exhaustion before any port sizing matters.
Set the idle timeout deliberately — 4 minutes to recycle short-lived flows, higher only when paired with application keepalives.
For AKS, choose outboundType at cluster creation — it is largely immutable, so design it in for any cluster that calls allow-listed endpoints.
For zone redundancy, deploy one zonal NAT Gateway + prefix per availability zone and publish the union of CIDRs; never share one across a multi-zone subnet.
Keep everything in the subnet Standard SKU — a single Basic public IP or LB blocks the association.
Alert on DroppedPackets > 0 and dashboard SNATConnectionCount so you catch the ceiling before users feel it.
Validate egress from inside the subnet/pod with an IP-echo service; the returned IP must always fall inside your prefix.
Decide NAT Gateway or forced-tunnel via firewall, not both by accident — a UDR to a firewall overrides the NAT Gateway egress path.
Document the final egress CIDR(s) as an immutable allow-list contract with each partner, versioned in your IaC repo.

Security notes

Egress identity is a security boundary. A stable, allow-listable prefix lets partners restrict who can reach them to your CIDR — make that contract explicit and version it, because a silent IP change becomes an outage and a security review.
NAT Gateway translates but does not inspect. It is not a firewall — it provides no FQDN filtering, no L7 rules, no threat intel. If egress must be controlled (allow only certain destinations), pair or replace it with Azure Firewall: Forced Tunneling & Hub-Spoke Routing.
Prefer Private Endpoints for Azure PaaS targets. Traffic to Storage, SQL, Key Vault and the like should stay on the Microsoft backbone and never consume SNAT or traverse the public internet — see Private Endpoint vs Service Endpoint.
Keep NSGs on the subnet. NAT Gateway does not replace network security groups; egress flows still pass through your NSG rules, so least-privilege outbound rules still apply.
Reduce blast radius with separate egress prefixes per environment. Production, staging and dev should egress through distinct prefixes so a compromised non-prod workload cannot impersonate prod’s allow-listed identity.
Audit the attached IPs as configuration. Changes to the prefix or attached IPs are security-relevant; manage them in IaC and review them in PRs, not by hand in the portal.

The egress-security knobs side by side:

Control	Mechanism	Secures against	Note
Stable egress identity	Public IP prefix	Partner over-permissive allow-lists	Publish once; never churn
Egress filtering	Azure Firewall (not NAT GW)	Exfiltration to arbitrary hosts	NAT GW does not filter
PaaS off the public path	Private Endpoint	Public-internet exposure of PaaS	Bypasses SNAT entirely
Least-privilege egress	NSG outbound rules	Unwanted destinations/ports	Still applies under NAT GW
Per-env isolation	Separate prefixes	Cross-env identity reuse	Distinct allow-lists
Config auditing	IaC + PR review	Silent IP/prefix drift	Treat egress IPs as code
Egress logging	Flow logs / firewall logs	Undetected anomalous egress	NAT GW itself does not log flows

Cost & sizing

The bill drivers are simple and bounded:

NAT Gateway has an hourly resource charge plus a per-GB data-processing charge for traffic flowing through it. The hourly cost is per gateway, so a zone-redundant design (three gateways) is roughly 3× the hourly base — budget for it deliberately.
Public IPs are charged per IP-hour. A /30 prefix is four IPs of cost whether or not you use all the ports; this is exactly why you size to peak rather than over-provisioning a /28.
Per-GB data processing dominates for high-egress workloads. A chatty service moving terabytes pays far more in processing than in the hourly base; reducing needless egress (caching, fewer round-trips) is the real cost lever.
The free alternative — default outbound — is a false economy at any production scale: it gives no stable IP and is being retired, and the cost of a SNAT-exhaustion outage during a sale dwarfs the NAT Gateway bill.

A rough monthly picture (figures are indicative; confirm current regional pricing):

Configuration	What you pay for	Rough INR / month	When it fits
1× NAT GW + /31 prefix, low traffic	1 GW hourly + 2 IPs + light per-GB	~₹2,500–4,000	Single-zone, modest egress
1× NAT GW + /30 prefix, medium traffic	1 GW + 4 IPs + moderate per-GB	~₹4,000–7,000	Production single-zone
1× NAT GW + /28 prefix	1 GW + 16 IPs + per-GB	~₹7,000–11,000	Single-zone, very high concurrency
3× NAT GW (per-zone) + 3× /30	3 GW hourly + 12 IPs + per-GB	~₹12,000–20,000	Zone-redundant production
High-egress (TB/month) any layout	Per-GB processing dominates	per-GB drives it	Data-heavy egress
Add one IP to an existing prefix	+1 IP-hour, no GW change	small delta	Quick capacity bump, no downtime
Default outbound (for contrast)	Nothing (until retired)	~₹0	Non-prod only; no stable IP

Sizing rule of thumb, distilled:

You have…	Provision	Idle timeout	Why
< 65K peak flows to one dest	1 IP (/32 or single)	4–10 min	One IP covers it
~65K–130K	/31 (2 IPs)	4–10 min	Two IPs, headroom
~130K–260K	/30 (4 IPs)	4 min	Aggressive recycle
~260K–520K	/29 (8 IPs)	4 min	Larger pool
~520K–1.03M	/28 (16 IPs)	4 min	Max on one GW
> 1.03M to one dest	Split across subnets/GWs	4 min	Past the 16-IP cap
Long-lived bursty (brokers/DBs)	size to peak	30–60 min + keepalive	Avoid mid-idle resets
Multi-zone HA	per-zone /30 each	4 min	One GW per zone

Interview & exam questions

1. What are the three outbound paths in Azure, and why does only NAT Gateway scale? Default outbound access (implicit, small shared pool, unpredictable IP, retiring Sept 2025), Load Balancer outbound rules (a fixed 64K budget you must pre-divide across the backend pool), and NAT Gateway (on-demand allocation from a large shared pool, ~64,512 ports per attached public IP, stable egress identity). NAT Gateway scales because ports are handed out dynamically across the subnet rather than pre-carved per instance.

2. What exactly is a SNAT port, and why is exhaustion a single-destination problem? A SNAT port is one entry in the translation table, keyed on the full 5-tuple — source IP, source port, destination IP, destination port, protocol. You are limited to ~64K connections to the same destination IP and port, not 64K total. Exhaustion therefore almost always means many concurrent flows to one VIP (a payment gateway, one storage endpoint); spreading load across destinations rarely exhausts ports.

3. How do you size a public IP prefix? Count the peak concurrent flows to the busiest single destination and compute required_IPs = ceil(peak_flows / 64,512), then pick the smallest prefix whose host count covers it. For ~140,000 flows that is 3 IPs → a /30 (4 IPs) with headroom. Never over-provision a /28 “to be safe” — you pay per IP and can add a prefix later with no downtime.

4. What is the outbound precedence when multiple egress configs exist? Highest to lowest: NAT Gateway (wins if present on the subnet), then an instance-level public IP on the NIC, then Load Balancer outbound rules, then default outbound access. Notably, NAT Gateway overrides a NIC’s own public IP for outbound — that IP still serves inbound, but egress goes through the NAT Gateway.

5. What is the per-NAT-Gateway IP cap and what do you do beyond it? A single NAT Gateway supports a maximum of 16 public IPs (individual + prefixes combined), giving ~1.03M ports to one destination via a /28. Beyond that, split the workload across multiple subnets, each with its own NAT Gateway and prefix — the documented scaling pattern.

6. How do you give AKS deterministic, allow-listable egress? Set outboundType at cluster creation to userAssignedNATGateway (you attach your own NAT Gateway + prefix to the node subnet, so partners allow-list your exact CIDR) or managedNATGateway (Azure provisions it). The default loadBalancer inherits LB SNAT limits; outboundType is largely immutable, so it must be chosen up front.

7. How do you make egress zone-redundant given that NAT Gateway is zonal? A NAT Gateway is single-zone, so you deploy one zonal NAT Gateway and node-pool subnet per availability zone. Pods in each zone egress through that zone’s gateway, and partners allow-list the union of the zonal prefixes. Sharing one gateway across a multi-zone subnet pins all egress to a single zone’s fate.

8. What does the TCP idle timeout do, and how should you set it? It controls how long an idle flow holds its SNAT port before reclaim (4–120 minutes, default 4). Lowering it frees ports faster (more effective capacity); raising it keeps quiet long-lived connections alive but holds ports longer. The durable fix for mid-idle resets is application keepalives, not cranking the timeout to 120.

9. Which NAT Gateway metric is the smoking gun for exhaustion, and what else do you watch? DroppedPackets — any sustained non-zero value strongly indicates exhaustion or capacity pressure. Watch it alongside SNATConnectionCount (the headline established-connection count, compared against 64,512 × attached IPs) and TotalConnectionCount. Alert on DroppedPackets > 0 over 5 minutes.

10. Your app exhausts SNAT despite “plenty of ports” — why? Because every flow targets the same destination IP and port, so they all consume one 5-tuple budget; total ports across other destinations are irrelevant. Confirm with App Insights dependencies grouped by target (one host dominates). Fix with connection reuse to that host, or shard across multiple endpoints if available.

11. Can NAT Gateway filter or inspect egress? No — it performs source NAT only; it has no FQDN filtering, L7 rules, or threat intelligence. For controlled egress (allow only specific destinations) you use Azure Firewall (often with forced tunneling), and for Azure PaaS you prefer Private Endpoints so traffic never traverses the public internet or consumes SNAT at all.

12. Why does NAT Gateway require Standard SKU everywhere in the subnet? NAT Gateway is a Standard-SKU resource and cannot coexist with Basic-SKU public IPs or Basic Load Balancers in the same subnet; the association is unsupported. Upgrade every public IP and LB in the subnet to Standard before attaching it.

These map to AZ-700 (Network Engineer Associate) — design and implement network connectivity and routing, hybrid and outbound connectivity — and touch AZ-104 (Administrator) for virtual networking and AZ-305 for designing resilient, allow-listable egress. A compact cert mapping:

Question theme	Primary cert	Objective area
Outbound paths, SNAT model, prefix sizing	AZ-700	Design & implement outbound connectivity
AKS `outboundType`, zone-redundant egress	AZ-700 / CKA-adjacent	Cluster networking design
Precedence, SKU constraints, association	AZ-104	Configure virtual networking
Allow-listable, resilient egress design	AZ-305	Design network architecture
Egress filtering vs translation	AZ-700 / AZ-500	Secure network connectivity

Quick check

Why does SNAT exhaustion almost always involve a single destination, and what part of the 5-tuple makes that true?
You have ~140,000 peak concurrent flows to one bank VIP. What prefix do you provision, and what is the per-IP port figure you divided by?
True or false: scaling out to more VM instances behind a NAT Gateway adds SNAT ports.
Your zone-redundant AKS cluster’s egress survives a single instance failure but not a zone outage. What did you get wrong, and what is the fix?
Long-lived broker connections keep resetting after exactly four minutes of quiet. Name the property involved and the robust fix.

Answers

The translation table is keyed on the full 5-tuple including destination IP and destination port; you can have ~64,512 flows to one dst IP:port per public IP. Many flows to one VIP share that single budget and exhaust it, while the same number spread across destinations would not.
ceil(140000 / 64512) = 3 IPs, so provision a /30 (4 IPs, ~258K ports, with headroom). The divisor is 64,512 SNAT ports per attached public IP.
False. NAT Gateway already pools ports across the whole subnet on demand; adding instances does not add ports. To add capacity you attach another public IP to the prefix (or lower the idle timeout / fix connection reuse).
You shared one zonal NAT Gateway across a multi-zone node subnet, pinning egress to that zone. The fix is one zonal NAT Gateway and node-pool subnet per availability zone, publishing the union of the zonal prefixes to the partner.
The TCP idle timeout (default 4 minutes) is reclaiming the idle flow’s SNAT port. The robust fix is application-level keepalives (or HTTP keep-alive / pooling) that reset the idle timer — preferable to merely raising the timeout toward 120.

Glossary

Source NAT (SNAT) — rewriting a private source IP and port to a public IP and port so outbound traffic can traverse the internet and return.
SNAT port — one entry in the NAT translation table, keyed on the full 5-tuple; the finite resource that exhausts.
5-tuple — the tuple (source IP, source port, destination IP, destination port, protocol) that uniquely identifies a flow; the reason exhaustion is per-destination.
NAT Gateway — a managed, highly available, outbound-only Azure resource attached to a subnet that performs SNAT through your public IP/prefix with on-demand port allocation.
Public IP prefix — a contiguous block of public IPs (e.g. /30, /28) you attach to a NAT Gateway and publish for partner allow-listing.
Default outbound access — the implicit egress a VM gets with no explicit outbound configured; small shared pool, unpredictable Microsoft-owned IP, retiring for new VNets on 30 September 2025.
Load Balancer outbound rules — explicit outbound port allocation off a Standard LB frontend; a fixed 64K budget you must pre-divide across the backend pool.
Idle timeout — how long an idle flow holds its SNAT port before reclaim (4–120 minutes, default 4); a capacity and connection-reset lever.
outboundType — the AKS cluster networking setting choosing egress mode: loadBalancer, managedNATGateway, userAssignedNATGateway, or userDefinedRouting; largely immutable after creation.
SNATConnectionCount — NAT Gateway metric for established outbound connections; the headline capacity number.
DroppedPackets — NAT Gateway metric whose sustained non-zero value is the smoking gun for SNAT exhaustion.
TotalConnectionCount — NAT Gateway metric for active flows through the gateway.
TCP keepalive — periodic packets that reset the idle timer so long-lived but quiet connections are not reclaimed; the robust alternative to raising the idle timeout.
Zonal resource — a resource pinned to a single availability zone; a NAT Gateway is zonal, so multi-zone egress needs one per zone.

Next steps

You can now give any subnet deterministic, allow-listable, exhaustion-proof egress and prove it from inside the network. Build outward:

Next: Standard Load Balancer Outbound Rules, Cross-Region & HA Ports — the alternative outbound mechanism and exactly when you would still reach for it.
Related: Private Endpoint vs Service Endpoint — take Azure PaaS traffic off the public path entirely so it never consumes SNAT.
Related: Azure Firewall: Forced Tunneling & Hub-Spoke Routing — when egress must be inspected and filtered, not merely translated.
Related: Azure VNet Deep Dive: Every Setting — the subnet, NSG and address-space fundamentals underneath every egress decision.
Related: Troubleshooting VNet Connectivity: NSG, UDR, Effective Routes & Network Watcher — when a route table, not SNAT, is hijacking your egress path.