AWS Gateway Load Balancer: Transparent Inline Inspection with Third-Party Appliances

You have a third-party firewall, IDS, or DLP appliance your security team insists on running, and a fleet of VPCs whose traffic must pass through it. The naive answer is a routed NVA sandwich: give the appliance two NICs, enable IP forwarding, point User Defined Routes at it, and pray that flows stay symmetric as you scale. It works for a pair of boxes. It does not survive horizontal scaling, because you cannot load-balance a transparent L3 appliance with a normal load balancer without rewriting the packet and destroying the very transparency the appliance needs.

Gateway Load Balancer (GWLB) exists precisely to fill that gap. It is a Layer 3/4 load balancer that distributes flows across a fleet of appliances, encapsulates the original, untouched packet in GENEVE, and hands it to the appliance so the appliance sees the real source and destination. The appliance inspects, then returns the packet — still encapsulated — and GWLB de-encapsulates and forwards it on. The source and destination workloads are oblivious. This article builds that pattern end to end: GENEVE mechanics, the GWLB-endpoint topology behind Transit Gateway, the route-table choreography for ingress, egress, and east-west, and the flow-stickiness behavior that keeps both directions of a connection on the same appliance.

1. Why GWLB: bump-in-the-wire without re-architecting endpoints

Three properties make GWLB different from an Application or Network Load Balancer, and all three matter for inline inspection:

It preserves the original packet. GWLB does not terminate the connection, rewrite the 5-tuple, or NAT. It wraps the whole original IP packet inside a GENEVE header and tunnels it to the appliance. The appliance sees the real client IP and the real destination IP. That is what “transparent” means, and it is why a security appliance can do real source-based policy and accurate logging.
It load-balances at L3/L4 across a horizontal fleet. Appliances register in a target group. GWLB hashes each flow and pins it to one appliance, so you scale capacity by adding appliances, not by buying a bigger box. Health checks pull dead appliances out of rotation automatically.
It separates the appliance VPC from the traffic it inspects via a PrivateLink-style endpoint. GWLB publishes a service; consumers create a Gateway Load Balancer endpoint (GWLBE) in their own VPC. Traffic is steered to the GWLBE with route tables, crosses into the appliance VPC over AWS PrivateLink, gets inspected, and comes back. The appliance fleet lives in its own account/VPC, owned by the security team, decoupled from the workload accounts.

GWLB is a bump in the wire. It does not originate or terminate flows and it does not route on its own. You are responsible for bending route tables around the GWLB endpoint. Nothing about traffic steering is implicit — and that is where almost every broken deployment goes wrong.

The combination lets you insert an arbitrary third-party appliance fleet into the path of ingress, egress, and east-west traffic without changing a single source or destination, and scale it horizontally with health-checked failover.

2. GENEVE encapsulation: what the appliance actually receives

GWLB speaks GENEVE (Generic Network Virtualization Encapsulation, RFC 8926) on UDP port 6081. This is non-negotiable: the appliance must understand GENEVE, and its security group must allow inbound UDP 6081. Vendor images marketed as “GWLB-compatible” (Palo Alto VM-Series, Fortinet FortiGate, Check Point CloudGuard, Aviatrix, plus open-source stacks) ship a GENEVE handler.

The flow looks like this:

 Original packet:        [ IP: client -> server | TCP | payload ]

 GWLB encapsulates:      [ IP: GWLB -> appliance | UDP 6081 | GENEVE hdr | <original packet> ]
                                                              ^ TLV options
                                                                carry flow cookie + GWLBE id

 Appliance inspects the INNER packet (sees real client -> server),
 then returns the SAME GENEVE-wrapped packet back to GWLB.

Key facts that drive your appliance configuration:

The outer header is GWLB-to-appliance over UDP 6081. The inner header is the original, byte-for-byte unmodified packet. The appliance applies policy to the inner packet.
GENEVE carries TLV options that include a per-flow identifier (often called the flow cookie) and the GWLBE identifier. A correctly written appliance integration echoes these options back unchanged on the return packet. GWLB uses them to map the inspected packet back to the correct flow and endpoint. If the appliance strips or rewrites them, return traffic breaks.
The appliance must operate in a two-arm GENEVE or single-arm mode per its vendor docs. Most modern GWLB integrations are single-arm: one data interface receives the GENEVE tunnel, inspects, and sends the packet back out the same interface. You do not configure two routed NICs the way a classic NVA sandwich does.
The appliance can allow (return the packet) or drop (do not return it) per flow. Dropping is how the appliance enforces a deny: GWLB simply never gets the packet back and the flow dies.

This is the crucial mental shift from a routed NVA: the appliance is not a router in the path. It is a GENEVE tunnel endpoint that GWLB feeds. It never sees its own forwarding table involved; it sees encapsulated copies of someone else’s traffic.

3. Topology: GWLB, GWLB endpoints, and the inspection VPC behind Transit Gateway

For anything past a couple of VPCs, the model that scales is a centralized inspection VPC behind a Transit Gateway, with GWLB endpoints doing the steering. Layout, per Availability Zone:

                         Internet
                            |
                          [ IGW ]
                            |
        +------------------ Inspection VPC ------------------+
        |   [ NAT GW subnet ]                                |
        |   [ GWLBE subnet ]  <- Gateway LB endpoint (GWLBE) |
        |   [ Appliance subnet ] <- GWLB targets (firewalls) |   GWLB + target group live here
        |   [ TGW attach subnet ] (appliance-mode attachment)|
        +----------------------------------------------------+
                            |
                  [ Transit Gateway ]  appliance mode ON
                   /          |          \
            Spoke VPC A   Spoke VPC B   Egress VPC
            (no IGW)      (no IGW)      (NAT + IGW)

What lives where:

GWLB and its target group live in the inspection VPC. Targets are the appliance ENIs (or an Auto Scaling group of appliances).
GWLB endpoints (GWLBE) are created in whichever VPC needs traffic steered into the appliance fleet. In the centralized model the GWLBEs typically sit in the inspection VPC itself (for egress to the internet) and the TGW pulls spoke traffic in.
Spoke VPCs have no internet path of their own. Their default route points to the TGW. The TGW route table sends that traffic to the inspection VPC attachment.
Appliance mode is enabled on the inspection VPC’s TGW attachment. This is the single most important flag for multi-AZ symmetry, covered in detail below.

The decoupling is the point: the security team owns the inspection VPC, the appliance AMIs, the rule sets, and the GWLB. Workload teams own their spokes and never see the appliances. The contract between them is the GWLB endpoint service and the TGW.

4. Build the GWLB, target group, and endpoint service

Create the target group first. For GWLB, the protocol is GENEVE and the port is 6081, and you typically health-check the appliance over TCP or HTTP on a port the appliance only serves when its data plane is alive.

# Target group for the appliance fleet. Protocol GENEVE, port 6081.
aws elbv2 create-target-group \
  --name tg-inspection-appliances \
  --protocol GENEVE --port 6081 \
  --vpc-id vpc-inspection \
  --target-type instance \
  --health-check-protocol TCP \
  --health-check-port 80 \
  --health-check-interval-seconds 10 \
  --healthy-threshold-count 3 \
  --unhealthy-threshold-count 3

Create the Gateway Load Balancer itself (--type gateway) with one subnet per AZ in the appliance subnets, then a listener that forwards to the target group. A GWLB has exactly one listener and it has no port/protocol of its own — all traffic flows through it:

# Gateway Load Balancer, one subnet per AZ.
aws elbv2 create-load-balancer \
  --name gwlb-inspection \
  --type gateway \
  --subnets subnet-appl-az1 subnet-appl-az2

# Listener: GWLB listeners forward all traffic to the target group.
aws elbv2 create-listener \
  --load-balancer-arn arn:aws:elasticloadbalancing:...:loadbalancer/gwy/gwlb-inspection/... \
  --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:...:targetgroup/tg-inspection-appliances/...

# Register appliance instances into the target group.
aws elbv2 register-targets \
  --target-group-arn arn:aws:elasticloadbalancing:...:targetgroup/tg-inspection-appliances/... \
  --targets Id=i-appliance-az1 Id=i-appliance-az2

# Publish GWLB as a VPC endpoint service (PrivateLink for GWLB).
aws ec2 create-vpc-endpoint-service-configuration \
  --gateway-load-balancer-arns arn:aws:elasticloadbalancing:...:loadbalancer/gwy/gwlb-inspection/... \
  --no-acceptance-required

That last call returns a service name (for example com.amazonaws.vpce.us-east-1.vpce-svc-0abc123def456). Endpoint consumers use that service name to create GWLB endpoints.

Now create a GWLB endpoint in each AZ of the VPC that needs steering. The endpoint type is GatewayLoadBalancer:

# One GWLBE per AZ, in a dedicated GWLBE subnet.
aws ec2 create-vpc-endpoint \
  --vpc-endpoint-type GatewayLoadBalancer \
  --vpc-id vpc-inspection \
  --service-name com.amazonaws.vpce.us-east-1.vpce-svc-0abc123def456 \
  --subnet-ids subnet-gwlbe-az1

Each GWLBE gets a vpc-endpoint-id (for example vpce-0aa11bb22cc33dd44). That endpoint id is what you set as the route target. Critical multi-AZ rule: create one GWLBE per AZ and reference the local AZ’s endpoint in each AZ’s route table. Sending us-east-1a traffic to a us-east-1b endpoint adds a cross-AZ hop and breaks symmetry assumptions.

5. Route-table choreography: ingress, egress, and east-west

This is where deployments live or die. The appliance only inspects traffic that route tables actually push through the GWLB endpoint. There is nothing implicit. Below is the centralized egress pattern (spokes reach the internet through the inspection VPC).

The GWLB endpoint id is referenced in routes via --vpc-endpoint-id.

Spoke VPC default route -> Transit Gateway:

# Spokes have no IGW. Default route goes to the TGW, which pulls
# the traffic into the inspection VPC.
aws ec2 create-route \
  --route-table-id rtb-spoke-a \
  --destination-cidr-block 0.0.0.0/0 \
  --transit-gateway-id tgw-0abc123

Inspection VPC — TGW attachment subnet route table (traffic arriving from spokes): send everything to the local GWLB endpoint so it hits the appliance before egress.

# TGW-attachment subnet (us-east-1a): force inbound spoke traffic
# through the local GWLB endpoint on its way out.
aws ec2 create-route \
  --route-table-id rtb-tgw-attach-az1 \
  --destination-cidr-block 0.0.0.0/0 \
  --vpc-endpoint-id vpce-0aa11bb22cc33dd44

Inspection VPC — GWLBE subnet route table (inspected traffic continuing outbound): after the appliance returns the packet and GWLB de-encapsulates, the packet egresses out of the GWLBE subnet. Point its default route at the NAT gateway.

# GWLBE subnet (us-east-1a): inspected egress traffic -> NAT GW.
aws ec2 create-route \
  --route-table-id rtb-gwlbe-az1 \
  --destination-cidr-block 0.0.0.0/0 \
  --nat-gateway-id nat-0az1

Inspection VPC — NAT subnet route table (return traffic from the internet): the return packet comes back from the NAT gateway and must be steered back through the same GWLB endpoint so the appliance sees both directions. Route the return toward the spoke CIDRs via the GWLB endpoint, and the default toward the IGW.

# NAT subnet (us-east-1a): return traffic to spokes must re-enter
# the appliance via the GWLB endpoint to preserve symmetry.
aws ec2 create-route \
  --route-table-id rtb-nat-az1 \
  --destination-cidr-block 10.0.0.0/8 \
  --vpc-endpoint-id vpce-0aa11bb22cc33dd44

# And the internet-facing default for the NAT subnet.
aws ec2 create-route \
  --route-table-id rtb-nat-az1 \
  --destination-cidr-block 0.0.0.0/0 \
  --gateway-id igw-0abc123

The pattern is a forced hairpin through the GWLB endpoint on the way out and on the way back. For east-west inspection between spokes, the same idea applies at the TGW: spoke-to-spoke traffic routes via the inspection VPC attachment, hits the GWLBE, gets inspected, and returns — you simply do not give the TGW a direct spoke-to-spoke route that bypasses the inspection attachment.

Repeat every route table per AZ, each referencing its own local GWLB endpoint. A single mismatched route that points an AZ at the wrong endpoint, or that lets traffic skip the GWLBE, is the textbook cause of “inspection is deployed but half the traffic isn’t being seen.”

6. Flow stickiness: keeping both directions on the same appliance

Stateful appliances demand that the forward and return packets of a connection hit the same appliance. The appliance that saw the SYN holds the session-table entry; if the SYN-ACK lands on a different appliance, it is out-of-state and gets dropped. GWLB enforces this with flow stickiness based on hashing, and you have to understand the two layers where symmetry can break.

Layer 1 — GWLB flow hashing. GWLB pins a flow to a target using a hash. By default it uses the 5-tuple (source IP, source port, destination IP, destination port, protocol). It can also be configured for 3-tuple (source IP, destination IP, protocol) or 2-tuple (source IP, destination IP). Because GWLB computes the same hash for the forward and the return flow of a given connection, both directions select the same appliance — as long as the same GWLB sees both directions. For protocols where the return 5-tuple is a clean mirror (the typical TCP/UDP case), 5-tuple is correct. For fragmented traffic or protocols where ports are not symmetric, 3-tuple or 2-tuple avoids splitting a logical flow:

# Set flow stickiness to 3-tuple if 5-tuple splits your traffic
# (e.g. heavy fragmentation). 2_tuple, 3_tuple, or 5_tuple.
aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:...:targetgroup/tg-inspection-appliances/... \
  --attributes Key=target_failover.on_deregistration,Value=rebalance \
               Key=target_failover.on_unhealthy,Value=rebalance

target_failover controls what happens to existing flows when a target is deregistered or goes unhealthy: no_rebalance (default) keeps existing flows pinned to the now-gone target until they expire (they break), while rebalance moves them to a healthy target. For stateless inspection, rebalance recovers faster; for stateful appliances without session sync, the moved flow is out-of-state anyway, so weigh it against your appliance behavior. Flow-stickiness tuple mode is a separate setting you choose when you create the target group.

Layer 2 — Transit Gateway appliance mode. In a multi-AZ topology, the TGW independently hashes the forward and return flows and can send them to different AZs. If the forward path enters via the us-east-1a GWLB endpoint and the return enters via us-east-1b, you are on a different appliance fleet entirely and symmetry is gone before GWLB even gets a chance. Appliance mode on the inspection VPC’s TGW attachment fixes this: it makes the TGW pin all packets of a flow (same 5-tuple) to the same AZ for the life of the flow.

# Appliance mode is mandatory for multi-AZ stateful inspection.
# Without it, the TGW splits forward/return across AZs and the
# stateful engine drops out-of-state return packets.
aws ec2 modify-transit-gateway-vpc-attachment \
  --transit-gateway-attachment-id tgw-attach-inspection \
  --options ApplianceModeSupport=enable

The two layers compose: appliance mode keeps a flow in one AZ, and GWLB flow hashing keeps it on one appliance within that AZ. Skip either and you get intermittent, maddening connectivity loss that looks like “random” drops but is actually asymmetric routing.

7. Health checks, horizontal scaling, and graceful draining

GWLB health-checks every registered target. An unhealthy appliance is pulled from the hashing rotation and its share of new flows redistributes across the survivors. As with the routed NVA case, health-check what the data plane actually serves, not just OS liveness — an appliance whose GENEVE/inspection engine has hung while SSH still answers is a black hole.

Scale horizontally by putting the appliances in an Auto Scaling group registered to the target group. New instances register and start taking flows; terminating instances should drain first. GWLB respects target-group deregistration delay (connection draining): in-flight flows on a draining target are allowed to complete, within the delay window, before the target is fully removed.

# Give draining appliances time to finish in-flight flows before
# they are removed from the GWLB rotation.
aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:...:targetgroup/tg-inspection-appliances/... \
  --attributes Key=deregistration_delay.timeout_seconds,Value=120

A few operational realities:

New flows go to healthy targets; existing flows on a failed target are not migrated unless target_failover.on_unhealthy=rebalance. Even then, a stateful appliance without session-state sync treats the rebalanced flow as new and may reset it. Decide this behavior on purpose.
Scale on the right metric. CPU on the appliance, or GWLB HealthyHostCount and active-flow counts, are better triggers than raw network throughput, which can be bursty.
Pre-warm for known spikes. GWLB scales, but appliance boot + GENEVE handshake + rule load takes minutes. For predictable load (a batch window, a launch), scale out ahead of time.

8. Cross-zone behavior, MTU, and the cost of double processing

Three production realities you must design for.

Cross-zone load balancing. GWLB has cross-zone load balancing disabled by default, and for inline inspection you usually want to keep it that way. With it disabled, a GWLB node in an AZ only sends to targets in the same AZ — which, combined with per-AZ GWLB endpoints and TGW appliance mode, keeps a flow’s data path inside one AZ end to end. Enabling cross-zone spreads flows across AZs’ appliances and incurs cross-AZ data transfer charges on the inspected traffic, which, given GWLB processes traffic twice (in and out), gets expensive fast. Enable it only if you have a specific imbalance to solve and you accept the cost.

MTU and the GENEVE overhead. GENEVE adds roughly 50+ bytes of encapsulation (outer IP + UDP + GENEVE header with options) on top of the original packet. The GWLB-to-appliance path must carry the encapsulated frame. If the original packet is already at the VPC MTU and the appliance/path cannot accommodate the larger encapsulated frame, you get fragmentation or silent drops of full-size packets. Mitigations: ensure the appliance interfaces and the GWLB data path support the larger frame (jumbo frames where available), and verify the appliance’s GENEVE handler does not itself further fragment. Path MTU issues here present as “small packets work, large transfers stall.”

The cost of double processing. Every byte of inspected traffic crosses the GWLB twice — once into the appliance, once back out — and you pay GWLB per-GB processing on it, plus the GWLB endpoint hourly + per-GB, plus the appliance compute, plus any NAT and cross-AZ transfer. This is real money at scale. The architectural lever is scope: do not force traffic you do not need to inspect through the GWLB. Intra-spoke, same-subnet, or trusted backplane traffic can bypass inspection via more specific routes, reserving the expensive inspected path for what actually warrants it.

Enterprise scenario

A SaaS platform team standardized on FortiGate as their mandated inspection appliance and adopted the centralized GWLB model: one inspection VPC, FortiGate fleet in a target group, GWLB endpoints, three spokes behind a Transit Gateway with appliance mode enabled. East-west and egress inspection passed every functional test in a single AZ. They went multi-AZ for resilience and immediately saw intermittent failures on long-lived connections — roughly a third of cross-spoke flows would establish, run for a while, then stall. Short connections looked fine, which sent everyone down the wrong debugging path (TLS? idle timeouts? appliance bug?).

The constraint was real symmetry across both the TGW and GWLB layers, and they had the TGW layer right — appliance mode was on. What they missed was a route-table asymmetry in the inspection VPC. The egress path (spoke -> TGW -> GWLBE-az -> appliance -> NAT) was correct per AZ. But the NAT subnet’s return route for spoke CIDRs had been written to point at the us-east-1a GWLB endpoint for all AZs, copied across route tables during a hurried Terraform refactor. So return traffic that had egressed via the az2 NAT was being shoved back through the az1 endpoint and az1 appliance fleet, while the forward flow lived on az2 — classic asymmetry, but only for flows that happened to hash onto az2 on the way out. The “random one-third” was exactly the flows whose forward AZ did not match the hard-coded return AZ.

The fix was to make the return route in each NAT subnet reference its own AZ’s GWLB endpoint, restoring per-AZ symmetry:

# us-east-1b NAT subnet: return-to-spoke traffic must use the
# LOCAL (az2) GWLB endpoint, not a hard-coded az1 endpoint.
aws ec2 create-route \
  --route-table-id rtb-nat-az2 \
  --destination-cidr-block 10.0.0.0/8 \
  --vpc-endpoint-id vpce-0bb22cc33dd44ee55   # az2 endpoint, not az1

The lesson generalizes: GWLB and TGW appliance mode guarantee symmetry only for the path you route symmetrically. Per-AZ GWLB endpoints mean per-AZ route tables, and every route that hairpins through “the GWLB endpoint” must hairpin through the local one. A copy-paste across AZs is all it takes to split a flow.

Verify

Confirm the data plane before you trust it.

1. Target group is GENEVE/6081 and targets are healthy:

aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:...:targetgroup/tg-inspection-appliances/... \
  --query "TargetHealthDescriptions[].{id:Target.Id, state:TargetHealth.State}"
# expect every target State = healthy

2. The GWLB is type gateway and has its listener:

aws elbv2 describe-load-balancers --names gwlb-inspection \
  --query "LoadBalancers[0].{type:Type, state:State.Code, azs:AvailabilityZones[].ZoneName}"
# expect type=gateway, state=active

3. Appliance mode is enabled on the inspection TGW attachment:

aws ec2 describe-transit-gateway-vpc-attachments \
  --transit-gateway-attachment-ids tgw-attach-inspection \
  --query "TransitGatewayVpcAttachments[0].Options.ApplianceModeSupport"
# expect: "enable"

4. Each AZ’s route tables reference the local GWLB endpoint:

# Confirm the az1 NAT/TGW route tables point at the az1 endpoint id,
# az2 at the az2 endpoint id. Mismatch here = asymmetry.
aws ec2 describe-route-tables \
  --route-table-ids rtb-nat-az1 rtb-nat-az2 \
  --query "RouteTables[].{rt:RouteTableId, routes:Routes[?GatewayId==null].[DestinationCidrBlock,VpcEndpointId]}"

5. Appliance security group allows GENEVE:

# The appliance SG must allow inbound UDP 6081 from the GWLB.
aws ec2 describe-security-groups --group-ids sg-appliance \
  --query "SecurityGroups[0].IpPermissions[?ToPort==\`6081\` && IpProtocol=='udp']"

6. Marked-traffic symmetry test (the real proof):

From a spoke instance, send identifiable traffic to an external host — for example curl https://example.com/ with a unique user-agent, or a sustained ping/iperf3 to a marked destination.
On the appliance, capture the inner packet and confirm it shows the real spoke source IP and the real destination (not a NATed or GWLB address). On a FortiGate/Palo Alto you can use the on-box capture; on a Linux appliance, tcpdump -ni <data-if> 'udp port 6081' shows the GENEVE outer, and decoding the inner confirms the original 5-tuple.
Confirm the same flow appears on exactly one appliance, and that both the request and the response are seen on that same appliance. Seeing the forward on appliance A and the return on appliance B is the asymmetry failure you are testing for.
Kill the inspection service on the appliance carrying the flow; confirm GWLB marks it unhealthy within the health-check window and new flows succeed via a survivor.

Pre-Production Checklist

Pitfalls

Treating the appliance like a routed NVA. It is a GENEVE tunnel endpoint, single-arm, not a two-NIC router. Configuring IP forwarding and UDR-style routing on the appliance is the wrong model.
Forgetting UDP 6081 on the appliance SG. The fleet registers, health checks may even pass on a separate port, and no GENEVE traffic ever lands. Silent.
One endpoint, many AZs. Pointing every AZ’s routes at a single GWLB endpoint defeats appliance mode’s per-AZ pinning and forces cross-AZ hops and asymmetry.
Asymmetric return routing. The most common failure: forward steered through the GWLB, return taking a path that skips it or hits a different AZ’s endpoint. Stateful appliances drop the out-of-state half.
Enabling cross-zone load balancing “for resilience” without realizing it spreads inspected traffic across AZs and bills you cross-AZ transfer on every doubly-processed byte.
Inspecting everything. GWLB processes traffic twice; forcing trusted east-west or intra-subnet traffic through it multiplies cost for no security benefit. Scope it.

Next Steps

Codify the entire pattern — GWLB, target group, endpoint service, per-AZ GWLB endpoints, every route table, the TGW attachment with appliance mode — in Terraform or CloudFormation so the symmetry-critical per-AZ routes cannot drift by hand. Alarm on GWLB HealthyHostCount and ActiveFlowCount, and on appliance CPU, so you see a degraded fleet before customers do. Run the marked-traffic symmetry test on a schedule, not just at go-live, because a single route-table refactor can silently re-introduce the asymmetry you fought so hard to eliminate.