AWS Networking

Designing Multi-Account VPC Connectivity with Transit Gateway and Centralized Egress

A two-account network you can mesh by hand. A forty-account estate with prod/non-prod isolation, a single internet egress point, and on-prem connectivity is an architecture problem. Get the hub wrong early and you inherit a peering mess, overlapping CIDRs you can never renumber, and a flat network where any compromised workload can reach every other one. This is how to build the hub correctly the first time.

Topology: Transit Gateway vs. peering vs. PrivateLink

These three are not competitors; they solve different problems. Pick deliberately.

Pattern Connectivity model Transitive routing Best for
VPC peering 1:1, full IP reachability No A handful of VPCs, lowest latency, no per-GB hub fee
Transit Gateway Hub-and-spoke, regional router Yes (policy-controlled) Many VPCs/accounts, segmentation, hybrid, central egress
PrivateLink Service endpoint (one NIC) N/A Exposing a single service across a trust boundary without IP routing

Peering does not scale: N VPCs need N(N-1)/2 connections and it is non-transitive, so spoke A cannot reach spoke C through B. PrivateLink is the right tool when you want to share one application (an internal API, a SaaS endpoint) without granting network-layer reachability — it sidesteps CIDR overlap entirely. For everything-talks-to-everything-under-policy across accounts, the Transit Gateway (TGW) is the answer. The rest of this guide builds that hub.

A TGW is a regional resource. A global estate needs one TGW per region, joined with inter-region peering attachments. Plan accounts and CIDRs with that boundary in mind from day one.

Step 1 — A non-overlapping CIDR plan with IPAM

The single most expensive mistake in multi-account networking is CIDR overlap. The TGW route table is a longest-prefix-match router; two VPCs advertising 10.0.0.0/16 cannot both be routed, and you cannot renumber a live VPC. Solve allocation centrally before anyone provisions a VPC.

Use AWS IPAM as the source of truth. Carve a top-level pool, then per-environment and per-region pools beneath it, and force every VPC to draw from IPAM.

resource "aws_vpc_ipam" "main" {
  operating_regions { region_name = "eu-west-1" }
}

resource "aws_vpc_ipam_pool" "top" {
  address_family = "ipv4"
  ipam_scope_id  = aws_vpc_ipam.main.private_default_scope_id
  locale         = "eu-west-1"
}

resource "aws_vpc_ipam_pool_cidr" "top" {
  ipam_pool_id = aws_vpc_ipam_pool.top.id
  cidr         = "10.0.0.0/8"
}

# Environment pool: prod gets a /12 out of the /8
resource "aws_vpc_ipam_pool" "prod" {
  address_family      = "ipv4"
  ipam_scope_id       = aws_vpc_ipam.main.private_default_scope_id
  locale              = "eu-west-1"
  source_ipam_pool_id = aws_vpc_ipam_pool.top.id
}

resource "aws_vpc_ipam_pool_cidr" "prod" {
  ipam_pool_id   = aws_vpc_ipam_pool.prod.id
  netmask_length = 12
}

Spoke VPCs then allocate from the pool instead of hard-coding a block. IPAM guarantees uniqueness across every account it monitors:

resource "aws_vpc" "spoke" {
  ipv4_ipam_pool_id   = aws_vpc_ipam_pool.prod.id
  ipv4_netmask_length = 20            # IPAM hands out a free /20
  enable_dns_support   = true
  enable_dns_hostnames = true
}

Reserve disjoint super-blocks per environment so route-table summarization stays clean later — for example prod 10.16.0.0/12, non-prod 10.32.0.0/12, shared services 10.48.0.0/12. Reserve a separate block for on-prem so hybrid routes never collide.

Step 2 — Provision the TGW and share it with RAM

Create the TGW in a dedicated network account (part of your AWS Organizations structure), then share it to every other account with Resource Access Manager (RAM). Turn off the default automation so route propagation and association become explicit, policy-driven decisions:

resource "aws_ec2_transit_gateway" "hub" {
  description                     = "Org hub TGW"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
  dns_support                     = "enable"
  amazon_side_asn                 = 64512   # for any future BGP attachments
  tags = { Name = "tgw-hub" }
}

Sharing with the whole organization removes the per-account invitation dance. This requires that you have enabled RAM sharing within AWS Organizations once (aws ram enable-sharing-with-aws-organization):

resource "aws_ram_resource_share" "tgw" {
  name                      = "tgw-hub-share"
  allow_external_principals = false
}

resource "aws_ram_resource_association" "tgw" {
  resource_arn       = aws_ec2_transit_gateway.hub.arn
  resource_share_arn = aws_ram_resource_share.tgw.arn
}

# Share to the entire org (or to specific OUs by ARN)
resource "aws_ram_principal_association" "org" {
  principal          = "arn:aws:organizations::111122223333:organization/o-exampleorgid"
  resource_share_arn = aws_ram_resource_share.tgw.arn
}

Once shared, a spoke account creates its attachment locally, referencing the shared TGW ID. This is the clean ownership split: the network account owns the TGW and its route tables; the spoke owns its VPC and attachment.

resource "aws_ec2_transit_gateway_vpc_attachment" "spoke" {
  transit_gateway_id = "tgw-0abc123..."        # the shared TGW
  vpc_id             = aws_vpc.spoke.id
  subnet_ids         = [for s in aws_subnet.tgw : s.id]  # one /28 per AZ
  dns_support        = "enable"
  tags = { Name = "att-spoke-prod-app1" }
}

Give the TGW its own tiny attachment subnets — a /28 per AZ is plenty — separate from workload subnets. Attach in every AZ you run workloads in; an attachment only delivers traffic to AZs where it has an ENI, and intra-AZ traffic avoids cross-AZ data charges.

Step 3 — Route-table segmentation

This is where a TGW earns its keep. A TGW route table is a routing domain. By controlling which attachments associate to a domain (which table they consult for outbound decisions) and which propagate into it (whose CIDRs appear there), you build isolation that a flat network cannot. The classic layout:

                 +------------------+
   prod spokes ->| prod RT          |
                 +------------------+
                 +------------------+
non-prod spokes->| non-prod RT      |
                 +------------------+
                 +------------------+
 shared svc VPC ->| shared RT       |
                 +------------------+
       egress VPC->| egress RT      |  <- default route lives here
                 +------------------+

Goal: prod talks to prod and to shared services; non-prod talks to non-prod and to shared services; prod and non-prod never reach each other; everyone reaches the internet only through the central egress VPC.

resource "aws_ec2_transit_gateway_route_table" "prod" {
  transit_gateway_id = aws_ec2_transit_gateway.hub.id
  tags = { Name = "rt-prod" }
}

# A prod spoke associates to the prod table...
resource "aws_ec2_transit_gateway_route_table_association" "prod_app1" {
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.spoke.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.prod.id
}

# ...and propagates its CIDR INTO the shared-services table (so shared svc can reach it)
resource "aws_ec2_transit_gateway_route_table_propagation" "prod_into_shared" {
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.spoke.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.shared.id
}

The mental model that keeps this straight: association = “which table do I use for my decisions”, propagation = “into which tables do I advertise my routes.” To let prod reach shared services, propagate the shared-services attachment into the prod table; to let shared services reach prod, propagate prod into the shared table. Because you never propagate prod into the non-prod table (and vice versa), those two domains have no route to each other even though they share one TGW. Isolation is the absence of a route, not a firewall rule.

The default route to the egress VPC is a static route in each spoke domain pointing at the egress attachment:

resource "aws_ec2_transit_gateway_route" "prod_default" {
  destination_cidr_block         = "0.0.0.0/0"
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.egress.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.prod.id
}

Step 4 — Centralized egress through a shared NAT + Network Firewall VPC

Per-VPC NAT gateways are a cost and governance sprawl: every spoke pays for NAT, and you have no single place to inspect or log egress. Consolidate into one egress VPC in the network account that owns the NAT gateways and an AWS Network Firewall for inspection. Spoke default routes already point at this VPC’s TGW attachment (Step 3).

The traffic path matters. Inside the egress VPC, force the flow TGW -> firewall endpoint -> NAT gateway -> internet. That means three subnet tiers per AZ and route tables that hand off between them:

ingress from TGW
   |
   v  (TGW subnet route table: 0.0.0.0/0 -> firewall endpoint)
[firewall subnet: AWS Network Firewall endpoint]
   |
   v  (firewall subnet route table: 0.0.0.0/0 -> NAT gateway)
[public subnet: NAT gateway + IGW]
   |
   v
internet
resource "aws_networkfirewall_firewall" "egress" {
  name                = "fw-central-egress"
  firewall_policy_arn = aws_networkfirewall_firewall_policy.egress.arn
  vpc_id              = aws_vpc.egress.id

  dynamic "subnet_mapping" {
    for_each = aws_subnet.firewall
    content { subnet_id = subnet_mapping.value.id }
  }
}

Critically, set the firewall policy to drop unmatched traffic and add explicit allow rules — a default-allow inspection layer inspects nothing useful:

resource "aws_networkfirewall_firewall_policy" "egress" {
  name = "policy-central-egress"
  firewall_policy {
    stateless_default_actions          = ["aws:forward_to_sfe"]
    stateless_fragment_default_actions = ["aws:forward_to_sfe"]
    stateful_engine_options { rule_order = "STRICT_ORDER" }
    stateful_default_actions = ["aws:drop_established", "aws:alert_established"]

    stateful_rule_group_reference {
      resource_arn = aws_networkfirewall_rule_group.allowlist.arn
      priority     = 100
    }
  }
}

The return path is the part people miss: the TGW route table for the egress VPC must carry routes back to every spoke CIDR (propagate all spokes into the egress domain), and the firewall subnet route table needs each spoke summary pointing back at the TGW. Because firewall endpoints are AZ-local, keep traffic symmetric — route an AZ’s flow through that same AZ’s firewall endpoint so the stateful engine sees both directions.

Network Firewall is billed per endpoint-hour plus per-GB processed. Centralizing means you pay for the endpoints once instead of per spoke, but the per-GB cost is real — this is why we drop east-west prod/non-prod traffic at the TGW (free, via missing routes) rather than hairpinning it through the firewall.

Step 5 — Centralized DNS with Route 53 Resolver

Spokes need to resolve private hosted zones, on-prem names, and AWS service endpoints consistently. Run Route 53 Resolver endpoints in the shared-services (or egress) VPC and point every spoke at them.

resource "aws_route53_resolver_endpoint" "outbound" {
  name      = "rslv-outbound"
  direction = "OUTBOUND"
  security_group_ids = [aws_security_group.resolver.id]
  dynamic "ip_address" {
    for_each = aws_subnet.resolver
    content { subnet_id = ip_address.value.id }
  }
}

resource "aws_route53_resolver_rule" "onprem" {
  name                 = "fwd-corp-internal"
  domain_name          = "corp.internal"
  rule_type            = "FORWARD"
  resolver_endpoint_id = aws_route53_resolver_endpoint.outbound.id
  target_ip { ip = "10.200.0.10" }
  target_ip { ip = "10.200.0.11" }
}

Share the rule across accounts with RAM, then associate it in each spoke VPC so the spoke honors it. For private hosted zones (e.g. an internal aws.example.com), associate the zone with each spoke VPC — or, at scale, share it and automate association. Spokes keep using the VPC .2 resolver; the rules ride underneath.

Step 6 — Hybrid connectivity into the hub

Terminate Direct Connect or Site-to-Site VPN on the TGW, not on individual VPCs — that is the whole point of the hub. For Direct Connect, associate a Transit VIF with a Direct Connect Gateway, then attach that DXGW to the TGW. For VPN, create a VPN attachment directly:

resource "aws_ec2_transit_gateway_dx_gateway_attachment" "dx" {
  transit_gateway_id     = aws_ec2_transit_gateway.hub.id
  dx_gateway_id          = aws_dx_gateway.main.id
}

Put the hybrid attachment in its own TGW route table. This lets you control exactly which environments on-prem can reach: propagate prod into the hybrid table only if prod is allowed to talk to the data center, and on the hybrid attachment associate a table that propagates only the environments cleared for on-prem. Advertise summarized routes (your reserved super-blocks from Step 1) over BGP rather than hundreds of /20s — the DXGW has an allowed-prefixes limit, and summarization keeps you well under it.

Verify

Validate routing and inspection before declaring victory. Do not trust the console alone — test the data plane.

# 1. The egress route table should carry a default to the egress attachment
aws ec2 search-transit-gateway-routes \
  --transit-gateway-route-table-id tgw-rtb-prod \
  --filters Name=type,Values=static

# 2. Prove prod CANNOT reach non-prod: route lookup should return blackhole/none
aws ec2 search-transit-gateway-routes \
  --transit-gateway-route-table-id tgw-rtb-prod \
  --filters Name=route-search.subnet-of-match,Values=10.32.0.0/12

# 3. From a spoke instance, confirm egress works and is inspected
curl -s https://checkip.amazonaws.com   # should return the NAT EIP, not the host

# 4. DNS: confirm the forwarding rule resolves on-prem names from a spoke
dig +short app.corp.internal

# 5. Connectivity test object-style check end to end
aws ec2 create-network-insights-path \
  --source i-spoke --destination i-shared --protocol tcp

Step 2 is the security-critical one. If a prod-to-non-prod lookup returns a route, your propagation is wrong and the two environments are not isolated. Reachability Analyzer (create-network-insights-path / start-network-insights-analysis) is the authoritative way to prove a path is — or is not — open across the whole TGW.

Operability checklist

Cost, observability, and scaling limits

Three things will bite a growing estate:

Cost. A TGW bills per attachment-hour and per GB of data processed, and that per-GB charge applies again to traffic that traverses the firewall. Two levers matter: keep cross-AZ traffic minimal (attach and route within-AZ), and never hairpin traffic through the firewall that you can simply drop at the TGW. East-west isolation via missing routes is free; isolation via firewall rules is not.

Observability. Enable TGW Flow Logs (distinct from VPC Flow Logs — they capture the inter-attachment view) and Network Firewall logging from day one. Centralize both in S3 and query with Athena. When someone reports “the app can’t reach the database,” flow logs plus Reachability Analyzer turn a multi-hour guessing game into a five-minute lookup.

Scaling past limits. A single TGW supports a large but finite number of attachments (a few thousand) and route-table routes per the documented service quotas — check the current figures and your account’s applied values before you design to the ceiling. Long before you hit a hard limit, segmentation is the real scaling tool: more route-table domains, summarized CIDRs from your IPAM hierarchy, and inter-region peering rather than one mega-region. If a single hub becomes a blast-radius or quota concern, split by business unit into multiple TGWs joined by peering — the CIDR discipline from Step 1 is exactly what makes that split painless.

Enterprise scenario

A retail platform team had the textbook hub from this guide running across ~30 accounts: prod and non-prod isolated by route-table domains, all egress hairpinned through one Network Firewall VPC. Then S3 traffic spiked the bill. Every spoke was reaching S3 over the public path — TGW data-processing, plus Network Firewall per-GB, plus NAT — for what was internal bulk data. A nightly analytics job alone pushed terabytes through the central firewall, and the per-GB charges on both the TGW and the firewall dwarfed the compute.

The fix was to keep S3 and DynamoDB traffic off the hub entirely with gateway VPC endpoints in each spoke. A gateway endpoint is free, adds a prefix-list route in the spoke’s own route table, and never touches the TGW or the firewall:

resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.spoke.id
  service_name      = "com.amazonaws.eu-west-1.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private.id]
}

The gotcha: the gateway-endpoint prefix-list route is longest-prefix-match against the 0.0.0.0/0 that pointed at the TGW, so it wins automatically for S3 — but only inside the VPC that owns the endpoint. They templated it into the spoke module so every account got one by default, then used an SCP Deny on s3:* unless aws:sourceVpce matched an approved endpoint, closing the public path for good. Firewall-processed GB dropped by roughly 70%, and the central inspection layer went back to inspecting traffic that actually leaves the estate.

Next steps

Wire the whole thing into a pipeline: the network account’s Terraform owns the TGW, route tables, RAM shares, egress VPC, and resolver endpoints; spoke accounts consume the shared TGW ID and resolver-rule ARNs via a thin module. Add an SCP that denies creating internet gateways or NAT gateways in spoke accounts so egress cannot bypass the hub. That combination — central hub, RAM sharing, and guardrail SCPs — is what turns a working network into one you can hand to forty teams and still sleep at night.

AWSVPCTransit GatewayNetworkingTerraformRAM

Comments

Keep Reading