GCP Identity

Building a Shared VPC: Centralized Networking Across Many GCP Projects

In a mature GCP organization, dozens of teams want their own projects but nobody wants dozens of disconnected networks. Shared VPC lets a single host project own the network while many service projects attach to it and deploy workloads into its subnets, keeping connectivity, IP space, and firewall policy under one roof. This walkthrough builds that seam end to end: model selection, project attachment, IP planning, subnet-scoped IAM, private access patterns, hybrid links, and centralized DNS.

Step 0: Choose the right model

Before touching gcloud, decide why you are not just peering everything. The three common patterns solve different problems.

Model Connectivity IAM / ownership Best for
Shared VPC Service projects deploy into the host’s subnets; one routing domain Network team owns subnets and firewall; app teams own VMs A single org’s landing zone with central network control
VPC Peering Two independent VPCs exchange routes, non-transitive Each side owns its own network fully Connecting separately-owned VPCs (e.g. partner, acquisition)
Network Connectivity Center (NCC) Hub-and-spoke; spokes can be VPCs, HA VPN, Interconnect Each spoke independent; hub provides transitivity Many VPCs/sites needing transitive any-to-any reach

The decisive question is ownership. Shared VPC centralizes control: a platform team defines subnets and firewall rules, and application teams consume them without ever creating a network. Peering keeps networks fully separate and is non-transitive (A-B and B-C does not give you A-C), which collapses at scale. NCC adds a transitive hub and is the right tool when you have many already-independent VPCs or a large hybrid estate to stitch together. These compose: a common enterprise topology is a Shared VPC per environment, with those host VPCs as NCC spokes for cross-environment or cross-region transitivity.

Rule of thumb: one team should govern IP allocation and firewall policy for a blast radius. That governance boundary is your Shared VPC.

Step 1: Designate the host project and attach service projects

You need the Compute Shared VPC Admin role (roles/compute.xpnAdmin) at the organization or folder level to enable a host and attach services. Project-level Owner is not enough.

# Enable the project as a Shared VPC host
gcloud compute shared-vpc enable HOST_PROJECT_ID

# Attach a service project to the host
gcloud compute shared-vpc associated-projects add SERVICE_PROJECT_ID \
  --host-project HOST_PROJECT_ID

# Verify the association
gcloud compute shared-vpc list-associated-resources HOST_PROJECT_ID

A host project can serve many service projects; a service project attaches to exactly one host. Keep the host project lean: it should contain the VPC, subnets, firewall rules, Cloud Routers, VPN/Interconnect, and Cloud DNS, and ideally no application workloads.

In Terraform the same wiring is explicit and reviewable:

resource "google_compute_shared_vpc_host_project" "host" {
  project = var.host_project_id
}

resource "google_compute_shared_vpc_service_project" "svc" {
  host_project    = google_compute_shared_vpc_host_project.host.project
  service_project = var.service_project_id
}

Step 2: Plan IP space and regional layout

IP planning is the decision you cannot cheaply undo. Subnets are regional in GCP, and a VPC is global, so a single VPC spans every region without peering. Carve a clear hierarchy out of RFC 1918 space and leave room to grow.

A workable convention is to give each region a large aggregate and slice subnets from it by environment and tier:

10.0.0.0/8                  org aggregate (reserve generously)
  10.10.0.0/16              region: us-central1
    10.10.0.0/20            prod   - general workloads
    10.10.16.0/20           prod   - GKE nodes
    10.10.64.0/20           nonprod
  10.20.0.0/16              region: europe-west1

For GKE you also need secondary ranges on the subnet for Pods and Services. Size the Pod range deliberately: with the default of 110 Pods per node, plan roughly a /24 of Pod IPs per node, which makes Pod ranges the fastest way to exhaust RFC 1918.

gcloud compute networks create hub-vpc \
  --project=HOST_PROJECT_ID \
  --subnet-mode=custom

gcloud compute networks subnets create prod-usc1 \
  --project=HOST_PROJECT_ID \
  --network=hub-vpc \
  --region=us-central1 \
  --range=10.10.0.0/20 \
  --secondary-range=pods=10.110.0.0/16,services=10.120.0.0/20 \
  --enable-private-ip-google-access \
  --enable-flow-logs

Avoiding RFC 1918 exhaustion is mostly discipline: never assign overlapping ranges (peering, VPN, and Interconnect all break on overlap), keep a documented IPAM source of truth, and reserve a separate non-overlapping block for GKE Pods so it never collides with on-prem.

Step 3: Delegate subnet-level access with the right scope

This is the part teams get wrong. Granting roles/compute.networkUser at the host project level lets a service project use every subnet and shared resource. For least privilege, bind compute.networkUser at the individual subnet level so a team can only deploy into the subnets they own.

# Per-subnet grant: this team can only use prod-usc1
gcloud compute networks subnets add-iam-policy-binding prod-usc1 \
  --project=HOST_PROJECT_ID \
  --region=us-central1 \
  --member="group:team-payments@example.com" \
  --role="roles/compute.networkUser"

You typically grant compute.networkUser to two principals: the human/group deploying resources, and the service agents that provision on their behalf. GKE and some managed services act through Google-managed service accounts (for example the GKE service agent and the Google APIs service agent), and those agents need networkUser on the subnet and its secondary ranges, plus roles/container.hostServiceAgentUser on the host project for GKE.

# GKE in a service project needs its host service agent user role
gcloud projects add-iam-policy-binding HOST_PROJECT_ID \
  --member="serviceAccount:service-SVC_PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com" \
  --role="roles/container.hostServiceAgentUser"

Keep the scope tight from day one. Walking back an org-wide networkUser binding after fifty teams depend on it is a multi-quarter project.

Step 4: Private access patterns

Workloads in the host subnets usually should not have external IPs. Wire up private egress to Google and producer services in the host project so every service project inherits it.

Private Google Access lets instances without external IPs reach Google APIs and services over internal IPs. Enable it per subnet (the --enable-private-ip-google-access flag above) and ensure DNS resolves *.googleapis.com appropriately (the private.googleapis.com or restricted.googleapis.com virtual IP ranges combined with a route).

Private Service Connect (PSC) gives you a private endpoint inside your VPC for Google APIs or for a third party’s published service, keeping traffic off the internet. Create the endpoint in the host project so it is reachable from all service projects:

# Reserve an internal address for the PSC endpoint
gcloud compute addresses create psc-googleapis \
  --project=HOST_PROJECT_ID \
  --region=us-central1 \
  --subnet=prod-usc1 \
  --addresses=10.10.0.100

# Forwarding rule targeting the Google APIs bundle
gcloud compute forwarding-rules create psc-googleapis-fr \
  --project=HOST_PROJECT_ID \
  --region=us-central1 \
  --network=hub-vpc \
  --address=psc-googleapis \
  --target-google-apis-bundle=all-apis

Serverless VPC Access connectors let Cloud Run, Cloud Functions, and App Engine reach internal resources in the Shared VPC. Create the connector in the host project against a dedicated /28 and grant the service project’s serverless agents networkUser:

gcloud compute networks vpc-access connectors create serverless-usc1 \
  --project=HOST_PROJECT_ID \
  --region=us-central1 \
  --network=hub-vpc \
  --range=10.8.0.0/28

Newer Cloud Run revisions support Direct VPC egress, which skips the connector entirely and scales better. Prefer it for new Cloud Run services; keep connectors for Cloud Functions and App Engine standard.

Step 5: Hybrid connectivity into the host project

Terminate all on-prem connectivity in the host project so every service project reaches the data center through one governed path. Use Cloud Interconnect (Dedicated or Partner) for high, predictable bandwidth, and HA VPN for encrypted connectivity over the internet. Both attach to a Cloud Router that exchanges routes via BGP, and those learned routes are visible to all service projects automatically.

HA VPN gives you a 99.99% SLA when you build two interfaces to two peer devices:

gcloud compute routers create hub-router \
  --project=HOST_PROJECT_ID \
  --network=hub-vpc \
  --region=us-central1 \
  --asn=64512

gcloud compute vpn-gateways create hub-ha-vpn-gw \
  --project=HOST_PROJECT_ID \
  --network=hub-vpc \
  --region=us-central1

From there you create two vpn-tunnels (one per gateway interface), peer BGP sessions on hub-router, and advertise your VPC ranges. Because routing lives in the host project, a new service project gets on-prem reachability the moment it attaches and is granted a subnet, with no per-project VPN to manage.

Watch route quotas. Dynamic routes learned over BGP and custom static routes both count against per-VPC limits. Aggregate advertised prefixes from on-prem rather than leaking hundreds of specifics.

Step 6: Centralized DNS

Run Cloud DNS in the host project. Private zones authoritative for your internal domains attach to the host VPC, so every service project resolves them. For resolving on-prem names, use DNS forwarding zones (forward to your on-prem resolvers); for the reverse, configure inbound server policy so on-prem can resolve GCP names.

# Private zone for internal names, attached to the host VPC
gcloud dns managed-zones create internal-corp \
  --project=HOST_PROJECT_ID \
  --dns-name="corp.example.internal." \
  --description="Internal records" \
  --visibility=private \
  --networks=hub-vpc

DNS peering lets one VPC delegate resolution of a zone to another VPC’s DNS configuration. In a Shared VPC this is mostly unnecessary because service projects already resolve the host VPC’s private zones, but it is the mechanism when you bridge resolution between separate VPCs (for example a Shared VPC host and a peered partner VPC).

Verify

Confirm the seam actually works before handing it to teams.

# Host is enabled and the right service projects are attached
gcloud compute shared-vpc list-associated-resources HOST_PROJECT_ID

# Subnet-level IAM shows only the intended bindings
gcloud compute networks subnets get-iam-policy prod-usc1 \
  --project=HOST_PROJECT_ID --region=us-central1

# A test VM in a SERVICE project can target the host subnet
gcloud compute instances create conn-test \
  --project=SERVICE_PROJECT_ID \
  --zone=us-central1-a \
  --subnet=projects/HOST_PROJECT_ID/regions/us-central1/subnetworks/prod-usc1 \
  --no-address

# BGP sessions to on-prem are established and learning routes
gcloud compute routers get-status hub-router \
  --project=HOST_PROJECT_ID --region=us-central1

From the test VM, validate Private Google Access (curl -s https://storage.googleapis.com should succeed with no external IP), resolve an internal name from the central zone, and ping an on-prem host to prove hybrid routing. Then delete the test VM.

Enterprise scenario

A fintech platform team ran one Shared VPC per environment and had wired compute.networkUser at subnet scope cleanly. Then a new service project’s GKE Autopilot clusters refused to create, failing with Google Compute Engine: Required 'compute.subnetworks.use' permission on the shared subnet. The human operator had networkUser on the subnet, so the assumption was the binding was correct.

The gotcha: Autopilot and node auto-provisioning provision through the GKE host service agent, not the operator’s identity. That agent (service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com) needs networkUser on the subnet and both secondary ranges (Pods and Services), plus container.hostServiceAgentUser on the host project. They had granted it on the primary range only, so cluster creation passed subnet checks but died allocating the Pod range.

The fix was to bind the agent at the secondary-range level explicitly, which the subnet IAM binding does not cover by default:

gcloud compute networks subnets add-iam-policy-binding prod-usc1 \
  --project=HOST_PROJECT_ID --region=us-central1 \
  --member="serviceAccount:service-SVC_PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com" \
  --role="roles/compute.networkUser"

They then codified the full set (agent + secondary ranges + hostServiceAgentUser) into the Terraform module that stamps every service project, so the next forty teams onboarded without a ticket. The lesson: in Shared VPC, service-agent IAM is the invisible half of the contract, and secondary ranges are where it silently breaks.

Checklist

Pitfalls and next steps

The recurring failure mode is owning the seam: the network team owns subnets and firewall rules in the host project, while application teams deploy workloads from service projects. Make that contract explicit, because a developer who cannot open a firewall rule will file a ticket, not a gcloud command. Prefer hierarchical firewall policies at the folder/org level for guardrails, and delegate scoped rule creation only where a team genuinely needs it.

Other traps: overlapping CIDRs that silently break peering and VPN; org-wide networkUser grants that erase least privilege; forgetting service-agent IAM so GKE clusters fail to create with cryptic permission errors; and route-quota exhaustion from leaking on-prem specifics. Next, codify all of this in Terraform with a per-environment host project, layer NCC if you need transitive multi-region or multi-VPC reach, and add Network Firewall Policies plus org policies (compute.vmExternalIpAccess) so the landing zone stays secure as it scales.

GCPShared VPCNetworkingVPCIAM

Comments

Keep Reading