DNS-based global load balancing is the default answer for multi-region traffic, and for HTTP it is usually fine. For TCP and UDP workloads where the protocol cannot be reverse-proxied at L7 – game servers, MQTT brokers, SIP, custom binary protocols, anything that needs source-IP preservation – DNS-based GSLB is the wrong tool. Resolver caching makes failover slow and unpredictable, and the client still traverses the public internet to reach your origin. This article builds the alternative: anycast static IPs that get clients onto the AWS backbone at the nearest point of presence (PoP), with endpoint-group weighting and sub-minute health-based failover. The reference implementation is AWS Global Accelerator, but the pattern – one IP, many PoPs, BGP-driven nearest entry – applies to any anycast accelerator.
1. Anycast fundamentals: one IP, many PoPs
Anycast means a single IP prefix is announced via BGP from many physical locations at once. Every PoP advertises the same /24 (or in the accelerator case, the same two /32-reachable addresses out of provider-owned space), and the global routing table resolves each client to whichever PoP is closest in BGP terms – fewest AS hops, best local preference – which in practice is the nearest edge.
The consequence that matters: the client never picks a region. It sends a SYN to one static IP, and the packet lands at the nearest PoP regardless of where your application actually runs. The accelerator terminates that connection at the edge and carries the traffic to your origin over the provider’s private backbone, not the public internet.
client (Frankfurt) --SYN--> 75.2.x.x (anycast)
|
nearest PoP (Frankfurt edge) <- BGP decides this
|
AWS backbone (private, congestion-managed)
|
endpoint group eu-central-1 (NLB / EC2 / EIP)
Two properties fall out of this design and they are the whole reason to use it:
- The internet portion of the path is minimized. The client reaches the edge over a short public hop, then rides the backbone the rest of the way. Backbone paths have lower jitter and loss than transit-provider paths across oceans.
- Entry point is decoupled from origin. You can move, add, or drain a region without the client ever seeing a different IP.
Global Accelerator gives you two static anycast IPv4 addresses (you can also enable dual-stack to add IPv6) from the Amazon IP pool, or you can bring your own (BYOIP) prefix. Those addresses do not change for the life of the accelerator, which is exactly what lets you hardcode them in clients, firewall allowlists, and license servers that cannot follow a CNAME.
2. Why anycast beats DNS GSLB for TCP/UDP
DNS-based steering (Route 53 latency/geo records, Traffic Manager, NS1) never sees a packet. It hands the client an address and the client connects directly. That indirection is the source of every problem for stateful L4 traffic.
| Dimension | DNS GSLB | Anycast accelerator |
|---|---|---|
| Steering signal | DNS answer (A/AAAA) | BGP route to nearest PoP |
| Failover gate | TTL + resolver caching + probe interval | Edge health check, no client cache |
| Realistic failover time | tens of seconds to minutes | sub-minute, often seconds |
| Client path | public internet, full distance | short hop to edge, then backbone |
| Source IP at origin | client IP (direct) | client IP (preserved) or edge IP, your choice |
| Hardcodable IP | no, clients cache stale answers | yes, static for the accelerator’s life |
The failover row is the decisive one. With DNS, a client that resolved your record 20 seconds ago keeps hammering the dead endpoint until its cache expires – and you cannot control resolver behavior, only your own TTL. Java’s default networkaddress.cache.ttl historically cached forever; plenty of clients ignore your 30-second TTL entirely. With anycast the client holds a connection to an IP that never moves. When a regional endpoint fails its health check, the edge stops forwarding new flows there and reroutes to the next-best endpoint group. No DNS change, no client cache to wait on.
For a request-response HTTP API you might not care. For a 4-hour game session, a long-lived MQTT subscription, or a financial market-data feed, a 90-second window of clients stuck on a dead region is an incident.
3. Standing up the accelerator and getting clients onto the backbone
Create the accelerator first. It allocates the two static anycast IPs immediately; everything else hangs off it.
# Global Accelerator is a global service; the control plane lives in us-west-2.
aws globalaccelerator create-accelerator \
--name edge-tcp-prod \
--ip-address-type IPV4 \
--enabled \
--region us-west-2
The response includes the AcceleratorArn and, once provisioning completes, the two IpSets[].IpAddresses. Capture the ARN:
ACCEL_ARN=$(aws globalaccelerator list-accelerators \
--region us-west-2 \
--query "Accelerators[?Name=='edge-tcp-prod'].AcceleratorArn" \
--output text)
A listener defines the ports and protocol that the anycast IPs accept. Here we take TCP 443 and a UDP range for a game protocol. client-affinity SOURCE_IP pins a given client to the same endpoint for the connection’s life – essential for protocols with server-side session state.
aws globalaccelerator create-listener \
--accelerator-arn "$ACCEL_ARN" \
--protocol TCP \
--port-ranges FromPort=443,ToPort=443 \
--client-affinity SOURCE_IP \
--region us-west-2
aws globalaccelerator create-listener \
--accelerator-arn "$ACCEL_ARN" \
--protocol UDP \
--port-ranges FromPort=30000,ToPort=30010 \
--client-affinity SOURCE_IP \
--region us-west-2
At this point the anycast IPs are live and accepting connections at every PoP, but they have nowhere to send traffic. That is the endpoint group’s job.
4. Endpoint groups, traffic dials, and per-region weighting
An endpoint group is the per-region landing zone. Each group lives in one AWS Region and contains one or more endpoints – Network Load Balancers, Application Load Balancers, EC2 instances, or Elastic IPs. Global Accelerator routes a flow to the closest healthy endpoint group, then load-balances across endpoints within it by weight.
Two independent knobs control distribution, and confusing them is the most common mistake:
- Traffic dial (
traffic-dial-percentage, 0-100): a ceiling on how much of the traffic that would arrive at this group is actually allowed in. It does not move traffic to another region; it caps this one and lets overflow spill to the next-nearest group. Use it to drain a region gracefully or cap a smaller region’s blast radius. - Endpoint weight (
Weight, 0-255): relative share within a group across its endpoints. Weight0is a clean way to remove an endpoint from rotation without deleting it.
LISTENER_ARN=$(aws globalaccelerator list-listeners \
--accelerator-arn "$ACCEL_ARN" \
--region us-west-2 \
--query "Listeners[?Protocol=='TCP'].ListenerArn" \
--output text)
# Primary region, full traffic dial, NLB endpoint.
aws globalaccelerator create-endpoint-group \
--listener-arn "$LISTENER_ARN" \
--endpoint-group-region us-east-1 \
--traffic-dial-percentage 100 \
--endpoint-configurations '[{"EndpointId":"arn:aws:elasticloadbalancing:us-east-1:111122223333:loadbalancer/net/prod-nlb/abc123","Weight":128}]' \
--health-check-port 443 \
--health-check-protocol TCP \
--health-check-interval-seconds 10 \
--threshold-count 3 \
--region us-west-2
# Secondary region, capped at 50% during ramp-up.
aws globalaccelerator create-endpoint-group \
--listener-arn "$LISTENER_ARN" \
--endpoint-group-region eu-central-1 \
--traffic-dial-percentage 50 \
--endpoint-configurations '[{"EndpointId":"arn:aws:elasticloadbalancing:eu-central-1:111122223333:loadbalancer/net/prod-nlb-eu/def456","Weight":128}]' \
--health-check-port 443 \
--health-check-protocol TCP \
--health-check-interval-seconds 10 \
--threshold-count 3 \
--region us-west-2
Traffic dial is not a weight between regions. If you want 70/30 across two regions for capacity reasons, do not set dials to 70 and 30 – that simply caps both and clients near each region still hit their local one first. Cross-region weighting at L4 is done by managing which clients are geographically near which group, plus dials to force spillover. If you need precise percentage splits across regions independent of geography, you are fighting the design and should reconsider whether anycast is the right layer.
5. Health checks and sub-minute failover for stateful protocols
Failover speed is health-check-interval-seconds x threshold-count, plus a few seconds of propagation. With a 10-second interval and a threshold of 3, an endpoint is declared unhealthy roughly 30 seconds after it stops responding, and new flows immediately route to the next-best group. The minimum interval is 10 seconds and the threshold range is 1-10, so the floor is about 10 seconds.
For ALB endpoints, Global Accelerator does not run its own probe – it inherits the ALB’s target-group health, which is correct because the ALB already knows the real backend state. For NLB, EC2, and EIP endpoints, the accelerator runs the TCP/HTTP/HTTPS check you configured above.
A subtlety that bites stateful protocols: when an endpoint fails, in-flight connections to it are not gracefully migrated – the dead endpoint is dead, and those TCP sessions reset. Anycast failover protects new flows and reconnections, not the connection that was mid-flight on the failed box. For a game or trading client this means a reconnect, but the reconnect lands on a healthy region in seconds rather than minutes. Design the client to reconnect to the same anycast IP on reset; it will be steered to a live endpoint automatically.
# Tighten failover to ~30s and confirm.
aws globalaccelerator update-endpoint-group \
--endpoint-group-arn "$EG_ARN" \
--health-check-interval-seconds 10 \
--threshold-count 3 \
--region us-west-2
6. Client affinity and source-IP preservation
Two distinct concerns, both critical for L4.
Client affinity controls stickiness. The default, NONE, hashes the 5-tuple (source IP, source port, dest IP, dest port, protocol) so a single client’s multiple connections may land on different endpoints. SOURCE_IP hashes only on source and destination IP, pinning all of a client’s flows to one endpoint – which you want for any protocol holding per-client server state. Set it on the listener as shown in Step 3.
Source-IP preservation controls what address your application sees as the client. Enable it and your backend sees the real client IP; disable it and the backend sees a Global Accelerator edge IP. Preservation is what lets origin-side security groups, geo-filtering, and audit logging keep working.
# Preserve client IP on the endpoint (per-endpoint flag).
aws globalaccelerator update-endpoint-group \
--endpoint-group-arn "$EG_ARN" \
--endpoint-configurations '[{"EndpointId":"i-0abc123def456","Weight":128,"ClientIPPreservationEnabled":true}]' \
--region us-west-2
Client-IP preservation has hard constraints. It is supported for Internet-facing NLB, EC2-instance, and ALB endpoints, but not for internal NLBs, and not when the accelerator and endpoint cross certain boundaries. Critically, when preservation is on, the relevant security group must allow the client’s IP (and the Global Accelerator managed prefix for health checks via
com.amazonaws.global.globalaccelerator), not an edge address. Get this wrong and traffic flows but health checks fail, or vice versa. Validate both paths before you trust it.
7. When to put regional L4/L7 balancers behind the accelerator
The accelerator is a global front door, not a regional load balancer. Inside each region you still want a balancer doing the per-target work:
- NLB behind the accelerator for pure L4 fan-out, preserved source IP, and millions of flows. This is the default for non-HTTP TCP/UDP.
- ALB behind the accelerator when you need L7 routing, TLS termination, or WAF and still want anycast entry and a static IP in front of an otherwise dynamic ALB. The accelerator inherits ALB target health and gives the ALB a fixed anycast address.
- EC2/EIP directly only for simple or single-instance cases; you lose intra-region balancing.
The clean mental model: Global Accelerator decides which region, the regional balancer decides which target. Health flows up – target health informs the regional LB, which (for ALB) informs the accelerator, which informs the routing decision. Never try to make the accelerator do per-target balancing; that is the regional LB’s job and it does it better.
Verify
Confirm the anycast IPs, prove the latency win, and force a failover.
1. Pull the live static IPs and endpoint health.
aws globalaccelerator describe-accelerator \
--accelerator-arn "$ACCEL_ARN" \
--region us-west-2 \
--query "Accelerator.IpSets[0].IpAddresses"
aws globalaccelerator describe-endpoint-group \
--endpoint-group-arn "$EG_ARN" \
--region us-west-2 \
--query "EndpointGroup.EndpointDescriptions[].{Id:EndpointId,Health:HealthState,Reason:HealthReason}"
All in-rotation endpoints should report HEALTHY.
2. Confirm you actually hit the nearest PoP and measure the win. Run from a client far from your origin region. Compare RTT to the accelerator IP against RTT to the origin’s regional public endpoint.
ACCEL_IP=75.2.0.1 # one of the two static IPs
# Anycast path (should hit a nearby PoP, then ride the backbone).
ping -c 20 "$ACCEL_IP" | tail -3
# Direct-to-region path for comparison (DNS GSLB-style).
ping -c 20 origin.eu-central-1.example.com | tail -3
On a transcontinental client the anycast path typically shows materially lower and tighter RTT (lower mean and lower stddev) because the long leg is on the backbone. If the numbers are identical, your client may already be adjacent to the origin region – test from a genuinely distant location.
3. Force a regional failover and time it. The honest test is to break the endpoint, not the health check config.
# Block the health-check + service port on the primary endpoint's SG,
# or stop the primary NLB targets. Then watch health flip.
watch -n 5 'aws globalaccelerator describe-endpoint-group \
--endpoint-group-arn "$EG_ARN" --region us-west-2 \
--query "EndpointGroup.EndpointDescriptions[].HealthState" --output text'
With a 10s interval and threshold 3 you should see HEALTHY -> UNHEALTHY within ~30 seconds, and a client that reconnects to the same anycast IP should land in the secondary region. Time the client-observed recovery, not just the console state change – that end-to-end number is what your SLA is made of.
4. Check CloudWatch for flow distribution. Global Accelerator publishes NewFlowCount, ProcessedBytesIn/Out, and per-endpoint-group metrics. Confirm traffic actually shifted regions during the failover window.
Enterprise scenario
A multiplayer game studio ran authoritative UDP game servers in us-east-1 and eu-central-1 behind Route 53 latency records. During a partial us-east-1 AZ impairment, players already mid-match kept their UDP sessions pinned to the failing region because their clients had cached the A record at match-join and the SDK’s resolver TTL was effectively ignored. Sessions degraded for two to three minutes – well past the point players rage-quit – even though the EU region was healthy the whole time. Their constraint: the game protocol is UDP with server-authoritative per-match state, so HTTP-based global balancing was a non-starter, and they could not push a client patch quickly.
They moved the public entry to Global Accelerator. Each region got an endpoint group fronting an internal NLB pool of game servers; the listener used UDP with SOURCE_IP affinity so a player’s packets pinned to one server for the match, and health checks ran at a 10-second interval with threshold 3. The client SDK already reconnected to the same address on packet-loss timeout – and because that address was now a static anycast IP, reconnects were steered to a healthy region in seconds with no DNS in the path. They also flipped the new region’s traffic dial up gradually to validate capacity before taking full load.
# UDP listener with per-player stickiness for match state.
aws globalaccelerator create-listener \
--accelerator-arn "$ACCEL_ARN" \
--protocol UDP \
--port-ranges FromPort=30000,ToPort=30010 \
--client-affinity SOURCE_IP \
--region us-west-2
The measured result: client-observed regional failover dropped from 2-3 minutes to under 30 seconds, mid-match disconnects during the next regional event were a brief reconnect rather than a multi-minute outage, and median RTT for distant players fell because the long leg now rode the AWS backbone instead of public transit. The one behavior they documented loudly for the on-call team: in-flight UDP flows on the failed servers still reset – anycast protects the reconnect, not the connection that was live on the dead box.