GCP Lesson 23 of 98

Google Cloud Load Balancing, In Depth: Global vs Regional, the LB Types & Backends

Sooner or later every workload on Google Cloud needs a front door. One virtual machine is a single point of failure; the moment you run two, something has to spread traffic across them, notice when one dies, and keep users from ever seeing the failure. That something is a load balancer. On most clouds a load balancer is a single box you put in a region. On Google Cloud it is something stranger and more powerful: for the flagship product, the load balancer is the network itself — a single anycast IP address announced from over a hundred Google edge locations worldwide, with no instance to size, patch, or scale.

That power comes with a price: choice. Google Cloud does not have one load balancer, it has a family, and picking the wrong member is the single most common networking mistake new architects make. Reach for a global product when you only serve one region and you over-pay and over-engineer; reach for a regional one when you have a global audience and you lose the anycast front door that makes GCP special; confuse a proxy load balancer with a passthrough one and you spend an afternoon wondering why the client IP your application logs is wrong.

This lesson is the map. By the end you will be able to look at any workload — a global web app, an internal microservice, a TCP game server, a Cloud Run container — and name the exact load balancer it needs and why. We will walk the whole family with a decision table, then take the flagship apart screw by screw: the chain of resources from forwarding rule to backend that every Google Cloud load balancer is built from, the health checks that keep it honest, session affinity, balancing modes, Cloud Armor at the edge, and the serverless network endpoint groups that let a load balancer point at Cloud Run and Cloud Functions. It maps to the Associate Cloud Engineer (ACE) and Professional Cloud Network Engineer (PCNE) exams.

Learning objectives

By the end of this lesson you can:

Prerequisites & where this fits

You should be comfortable with a virtual private cloud (VPC), subnets and firewall rules — load balancers live inside a VPC and forward to backends in subnets, and a missing firewall rule for health-check probes is the classic reason a brand-new load balancer reports every backend as unhealthy. If those terms are hazy, read Google Cloud VPC, In Depth (gcp-vpc-deep-dive-subnets-routes-firewall-nat) first. A working knowledge of managed instance groups helps but is not required. This lesson sits in the Networking module of the Google Cloud Zero-to-Hero course, after VPC and before Google Kubernetes Engine. It is the conceptual companion to the hands-on build in Engineering the Global External Application Load Balancer on GCP (gcp-global-external-application-load-balancer-deep-dive): this lesson teaches you which load balancer to choose and how the pieces fit; that one walks you through wiring the flagship end to end with every tuning knob.

Core concepts: the two axes that define every load balancer

Before any product names, internalise the two questions that uniquely identify a Google Cloud load balancer. Every member of the family is just a point on this 2×2 (well, 2×3) grid.

Axis 1 — what kind of traffic? (the OSI layer). A load balancer either understands your application protocol or it does not.

The single most important consequence: a proxy load balancer hides the client IP (the backend sees Google’s IP unless you read the X-Forwarded-For header or enable the PROXY protocol), while a passthrough load balancer preserves it (the backend sees the real client). Architects who log the wrong field and see Google IPs everywhere have invariably forgotten this.

Axis 2 — where does it live and who can reach it? (scope and exposure).

Two terms you will meet throughout:

One more idea worth fixing early: Google Cloud load balancers are software-defined, not appliances. There is no instance to provision, no throughput SKU to pick for the flagship, and the global ALB scales to millions of queries per second without any pre-warming. You configure a graph of resources and Google’s edge fabric runs it.

The load balancer family: a decision table

Here is the whole family on one page. Read the traffic type and scope columns first; they determine the product, and everything else is detail.

Load balancer Layer / proxy Scope Exposure Protocols Frontend IP Network Tier Primary use case
Global external Application LB L7 proxy Global External HTTP, HTTPS, HTTP/2, gRPC Global anycast Premium Internet-facing web apps & APIs with a global audience; Cloud CDN, advanced routing
Regional external Application LB L7 proxy Regional External HTTP, HTTPS, HTTP/2 Regional Standard or Premium Internet-facing web app pinned to one region; data-residency or Standard-Tier cost
Internal Application LB L7 proxy Regional (or cross-region) Internal HTTP, HTTPS, HTTP/2, gRPC Private (VPC) n/a L7 routing between internal microservices
External passthrough Network LB L4 passthrough Regional External TCP, UDP, ESP, ICMP, L3_DEFAULT Regional Standard or Premium Internet-facing non-HTTP (game servers, custom TCP/UDP), source-IP preservation, very low overhead
Internal passthrough Network LB L4 passthrough Regional Internal TCP, UDP, ICMP, L3_DEFAULT Private (VPC) n/a Internal L4 distribution; the only LB usable as a next-hop route; source-IP preserved
External proxy Network LB L4 proxy Global or regional External TCP, SSL (TLS) Anycast (global) / regional Premium / Standard Internet-facing TCP with TLS offload but no L7 routing
Internal proxy Network LB L4 proxy Regional (or cross-region) Internal TCP, SSL Private (VPC) n/a Internal TCP proxying / TLS offload between services

How to read this in practice — the decision tree in words:

  1. Is your traffic HTTP/HTTPS/gRPC and do you want path/host routing, TLS termination, caching, or a WAF? Use an Application Load Balancer. Then: internet-facing and global audience → global external ALB; internet-facing but single region (or you need Standard Tier / data residency) → regional external ALB; service-to-service inside the VPC → internal ALB.
  2. Is it raw TCP/UDP (a game server, a database protocol, SMTP, syslog), or do you need the backend to see the real client IP, or do you need the lowest possible overhead? Use a passthrough Network LB — external for the internet, internal for inside the VPC. The internal passthrough NLB is also special: it is the only load balancer you can name as the next hop in a custom route, which is how you build network virtual appliance (firewall) chains.
  3. Is it TCP and you want TLS offload or a global anycast TCP front end but you do not need to inspect the application layer? Use a proxy Network LB.

A few clarifying notes that trip people up. The global external ALB is the modern, Envoy-based successor to the legacy “HTTP(S) Load Balancer”; you may still see the old name in documentation. There are two editions of the global external ALB — a global one and a classic one (the latter is the older control plane); new builds should use the global (non-classic) one for the full feature set. The regional Application and proxy LBs and the internal ALB all run on the same open-source Envoy data plane, which is why they share advanced traffic-management features. The passthrough Network LBs use Google’s Maglev data plane, which is why they are connectionless, preserve source IP, and add almost no latency.

The building blocks: from forwarding rule to backend

Every Google Cloud load balancer — whatever its layer or scope — is assembled from the same chain of resources. Learn the chain once and you understand all of them; the only differences are which pieces are global vs regional and whether a URL map exists (L7 only). This is also exactly what an exam will ask you to put in order.

# Resource What it does Scope L7 only?
1 Forwarding rule The front end: binds an IP address + port + protocol and points at a target proxy (L7/proxy) or backend service (passthrough). This is what clients connect to. Global or regional No
2 Target proxy Terminates the connection. target-http(s)-proxy for ALB, target-tcp/ssl-proxy for proxy NLB. Holds the SSL certificate and SSL policy for HTTPS/SSL. References the URL map (L7) or backend service (proxy NLB). Global or regional Proxy LBs only
3 URL map The router: matches host, path, header and query parameters and sends each request to the right backend service. Also does redirects and header/path rewrites. Global or regional Yes (ALB)
4 Backend service The brain: groups backends, owns the health check, and holds policy — protocol, balancing mode, session affinity, timeouts, connection draining, Cloud CDN, Cloud Armor, logging. Global or regional No
5 Backend The actual endpoints behind the service: a managed/unmanaged instance group (MIG), a network endpoint group (NEG) — zonal, serverless, internet, hybrid, or Private Service Connect — or a Cloud Storage bucket (CDN origin). Zonal/regional No
Health check Probes each backend and removes unhealthy ones from rotation. Attached to the backend service. Global or regional No

Read the chain top to bottom as a request’s journey: a packet hits the forwarding rule (the IP:port), which hands it to the target proxy (which terminates TLS), which consults the URL map (which inspects the path and chooses a route), which points at a backend service (which applies policy and load-balances), which selects a healthy backend endpoint. For a passthrough Network LB the chain is shorter — forwarding rule → backend service → backend — because there is no proxy and no URL map; packets flow straight through.

Here is the chain built in gcloud for a global external Application LB in front of a managed instance group, so the abstractions become concrete. (The companion lesson, gcp-global-external-application-load-balancer-deep-dive, expands every flag below.)

# 5 + health check: a backend MIG already exists as "web-mig" in us-central1.
gcloud compute health-checks create http web-hc \
  --port=8080 --request-path=/healthz \
  --check-interval=5s --timeout=5s \
  --healthy-threshold=2 --unhealthy-threshold=3 \
  --global

# 4: backend service (global) — the brain.
gcloud compute backend-services create web-bes \
  --protocol=HTTP --port-name=http \
  --health-checks=web-hc \
  --global

# attach the MIG as a backend, with a balancing mode (see below).
gcloud compute backend-services add-backend web-bes \
  --instance-group=web-mig \
  --instance-group-region=us-central1 \
  --balancing-mode=UTILIZATION --max-utilization=0.8 \
  --global

# 3: URL map — send everything to web-bes for now.
gcloud compute url-maps create web-map --default-service=web-bes

# 2: target proxy (HTTP here; HTTPS would attach a certificate).
gcloud compute target-http-proxies create web-proxy --url-map=web-map

# 1: forwarding rule — reserve a global anycast IP, then bind :80.
gcloud compute addresses create web-ip --ip-version=IPV4 --global
gcloud compute forwarding-rules create web-fr \
  --address=web-ip --target-http-proxy=web-proxy \
  --ports=80 --global

Notice that --global appears on the health check, backend service, URL map, target proxy, address and forwarding rule. Consistency of scope is everything: mix a regional forwarding rule into this chain and you have silently built a regional ALB — a different product with no anycast. If gcloud complains that a resource “cannot be used” by another, a scope mismatch is the first thing to check.

Health checks: how the load balancer knows what is alive

A load balancer is only as good as its ability to stop sending traffic to a dead backend. That is the health check — a probe Google sends to each endpoint on an interval; pass enough times in a row and the backend is healthy and receives traffic, fail enough times and it is unhealthy and is pulled from rotation until it recovers.

Setting What it is Choices / default Notes
Protocol How the probe is made HTTP, HTTPS, HTTP/2, TCP, SSL, gRPC Match it to your app. HTTP(S) checks can assert a path and an expected response.
Port Where to probe A fixed port, or use serving port, or a named port A dedicated health-check port/path that checks dependencies (DB, cache) gives a truer signal than a static page.
Request path The URL to hit (HTTP[S]) default / Use a real /healthz that returns 200 only when the instance can actually serve.
Check interval Seconds between probes default 5s Lower = faster detection, more probe traffic.
Timeout How long to wait for a reply default 5s Must be ≤ interval.
Healthy threshold Consecutive passes to mark healthy default 2
Unhealthy threshold Consecutive fails to mark unhealthy default 2 Higher avoids flapping on a transient blip.

Two operational facts cause almost every “all my backends are unhealthy” support ticket:

  1. You must allow the probe source IPs in your firewall. Health-check probes come from fixed Google ranges, not from your clients. For most modern load balancers (global ALB, proxy LBs, internal LBs) the probes — and the proxied data plane — originate from 130.211.0.0/22 and 35.191.0.0/16. Add an ingress allow rule for those ranges to your backend port or the load balancer reports everything down even though the app is fine. (The legacy/passthrough NLB health checks also use 35.191.0.0/16 and 209.85.152.0/22/209.85.204.0/22.)
  2. A health check is a load-balancing health check, not the same thing as an MIG autohealing health check. The load-balancing one removes a sick backend from traffic; an autohealing health check on the managed instance group recreates the VM. You usually want both, and you usually want the autohealing one to be more lenient so a brief load-balancer blip does not trigger a full VM rebuild.

There is also a centralised vs distributed distinction for internal/regional Envoy-based load balancers: traditional health checks probe from Google’s central infrastructure, while distributed Envoy health checks probe from the Envoy proxies themselves — relevant at very large scale, but the central model is the default and is correct for most workloads.

Balancing mode and capacity: how traffic is spread

When a backend service has more than one backend, the balancing mode decides how a new request is assigned and, crucially, defines when a backend is considered “full” so traffic spills to the next region (for global LBs) or the next backend.

Balancing mode “Full” is measured by Available on Typical use
UTILIZATION Average CPU utilisation of the instance group Instance-group backends General compute backends; cap with --max-utilization (e.g. 0.8).
RATE Requests per second, per instance or per group Instance groups & some NEGs When you know the QPS a backend can take; cap with --max-rate / --max-rate-per-instance.
CONNECTION Number of concurrent connections TCP/SSL & passthrough backends L4 load balancers where connections, not requests, are the unit.

The companion levers:

For a global external ALB the practical pattern is: backends in two or more regions, each with a balancing mode and a sensible ceiling, so that normal traffic is served from the nearest region and a regional failure (or saturation) automatically overflows to the next — no DNS changes, no manual failover.

Session affinity: pinning a client to a backend

By default a load balancer treats every request independently and may send consecutive requests from the same user to different backends. Session affinity (“sticky sessions”) instead pins a client to the same backend, which matters for applications that keep per-user state in memory. It is configured on the backend service.

Affinity type Pins on Layer Notes
NONE nothing (default) any Best distribution; use stateless backends + external session store.
CLIENT_IP client IP (and protocol/port variants) L4 / L7 Coarse: clients behind one NAT share a backend; breaks if client IP changes.
GENERATED_COOKIE a cookie the LB issues (GCLB) L7 only Most precise for web apps; survives client-IP changes.
HEADER_FIELD a named HTTP header L7 only Affinity keyed on, e.g., a tenant header.
HTTP_COOKIE a cookie you name L7 only Like generated cookie but you control the name/TTL/path.

The architectural caveat worth saying out loud: session affinity is a performance optimisation, not a correctness guarantee. Affinity can break when a backend becomes unhealthy, when capacity is exceeded, or when the backend set changes — so a robust design keeps session state in Memorystore or a database and treats stickiness as a nice-to-have, not a load-bearing assumption. Also note affinity and balancing mode can pull against each other: strong affinity can leave some backends hotter than others, undermining even distribution.

Cloud Armor: WAF and DDoS at the edge

For external Application and external proxy Network load balancers you can attach Cloud Armor, Google’s web application firewall and DDoS service, as a security policy on the backend service. Because the global external ALB terminates connections at Google’s edge, Cloud Armor inspects and filters traffic at the edge — before it ever reaches your backends or even your region.

What Cloud Armor gives you:

# A minimal Cloud Armor policy: deny one country, rate-limit the rest, attach it.
gcloud compute security-policies create web-armor --description="edge WAF"

gcloud compute security-policies rules create 1000 \
  --security-policy=web-armor \
  --expression="origin.region_code == 'XX'" \
  --action=deny-403

gcloud compute backend-services update web-bes \
  --security-policy=web-armor --global

The mental model: Cloud Armor attaches to the backend service, like Cloud CDN does, so policy is per-backend, not per-frontend — you can apply a strict WAF to your /admin backend and a looser one to static content. It is only available where there is an edge proxy to enforce it, i.e. external ALBs and external proxy NLBs, not the passthrough NLBs.

Serverless NEGs: load-balancing Cloud Run, Functions and App Engine

A load balancer does not only point at VMs. A network endpoint group (NEG) is a backend that is a set of endpoints rather than an instance group, and one of its most useful forms is the serverless NEG, which points the load balancer at a Cloud Run service, a Cloud Functions function, or an App Engine app. This is the supported way to put a custom domain, Cloud CDN, Cloud Armor, or path-based routing in front of serverless — capabilities the bare *.run.app URL does not give you.

NEG types worth knowing (this is exam fodder):

NEG type Endpoints are Used by Example
Zonal NEG (GCE_VM_IP_PORT) IP:port of VMs/containers in a zone ALB / proxy NLB Fine-grained backends, GKE container-native LB
Serverless NEG a Cloud Run / Functions / App Engine service external & internal ALB Custom domain + Cloud Armor in front of Cloud Run
Internet NEG (INTERNET_FQDN_PORT / INTERNET_IP_PORT) an external FQDN or IP global external ALB Front an on-prem or third-party origin behind GCP CDN/Armor
Hybrid connectivity NEG (NON_GCP_PRIVATE_IP_PORT) private IP:port reachable via VPN/Interconnect ALB Route to on-prem or another cloud over hybrid links
Private Service Connect NEG a published PSC service ALB Reach a Google or partner service via PSC
# Serverless NEG → Cloud Run service "api", wired into a global external ALB.
gcloud compute network-endpoint-groups create api-neg \
  --region=us-central1 \
  --network-endpoint-type=serverless \
  --cloud-run-service=api

gcloud compute backend-services create api-bes --global   # no health check needed for serverless
gcloud compute backend-services add-backend api-bes \
  --global --network-endpoint-group=api-neg \
  --network-endpoint-group-region=us-central1
# then reference api-bes from the URL map as the route for /api/*

Two gotchas: serverless NEGs do not use health checks (the serverless platform manages availability), and a serverless NEG is regional — to serve a Cloud Run service globally you add a serverless NEG per region to one global backend service. The internet and hybrid NEGs are how the same global front door — with its anycast IP, CDN, and Cloud Armor — can sit in front of workloads that are not even on GCP.

Google Cloud Load Balancing family

The diagram lays out the family along the two axes — Application (L7) versus Network (L4), external versus internal, global versus regional — and shows the shared resource chain (forwarding rule → target proxy → URL map → backend service → backend/NEG) that every member is assembled from.

Hands-on lab: build a global external Application LB over a managed instance group

This lab builds the flagship — a global external ALB serving a simple web app from a managed instance group — using only the GCP Free Tier and $300 credit. You will create the backend, the full resource chain, validate that traffic flows, and tear it all down.

Prerequisites: a project with billing enabled, the Compute Engine API enabled, and Cloud Shell (which has gcloud pre-installed and authenticated). Set defaults:

gcloud config set project YOUR_PROJECT_ID
gcloud config set compute/region us-central1
gcloud config set compute/zone us-central1-a
gcloud services enable compute.googleapis.com

Step 1 — a backend that serves something. Create an instance template whose VMs run a tiny web server on port 80 and identify themselves, then a managed instance group of two.

gcloud compute instance-templates create web-tmpl \
  --machine-type=e2-small \
  --image-family=debian-12 --image-project=debian-cloud \
  --tags=lb-backend \
  --metadata=startup-script='#! /bin/bash
    apt-get update && apt-get install -y nginx
    HOST=$(hostname)
    echo "Served by ${HOST}" > /var/www/html/index.html
    echo OK > /var/www/html/healthz'

gcloud compute instance-groups managed create web-mig \
  --template=web-tmpl --size=2 --region=us-central1
gcloud compute instance-groups set-named-ports web-mig \
  --named-ports=http:80 --region=us-central1

Step 2 — allow health-check and proxy traffic. Without this, every backend shows UNHEALTHY.

gcloud compute firewall-rules create allow-lb-health \
  --network=default --direction=INGRESS --action=ALLOW \
  --rules=tcp:80 \
  --source-ranges=130.211.0.0/22,35.191.0.0/16 \
  --target-tags=lb-backend

Step 3 — the resource chain. Health check → backend service → URL map → proxy → forwarding rule, all --global.

gcloud compute health-checks create http web-hc \
  --port=80 --request-path=/healthz --global

gcloud compute backend-services create web-bes \
  --protocol=HTTP --port-name=http --health-checks=web-hc --global
gcloud compute backend-services add-backend web-bes \
  --instance-group=web-mig --instance-group-region=us-central1 \
  --balancing-mode=UTILIZATION --max-utilization=0.8 --global

gcloud compute url-maps create web-map --default-service=web-bes
gcloud compute target-http-proxies create web-proxy --url-map=web-map

gcloud compute addresses create web-ip --ip-version=IPV4 --global
gcloud compute forwarding-rules create web-fr \
  --address=web-ip --target-http-proxy=web-proxy --ports=80 --global

Step 4 — validate. Find the IP, wait for health, then curl it a few times.

gcloud compute addresses describe web-ip --global --format='value(address)'
# Backend health (wait until HEALTHY — can take a few minutes):
gcloud compute backend-services get-health web-bes --global

IP=$(gcloud compute addresses describe web-ip --global --format='value(address)')
for i in 1 2 3 4; do curl -s http://$IP/; done

Expected output: get-health eventually shows both instances HEALTHY. The curl loop returns Served by web-... and, across repeated calls, you should see both instance hostnames — proof the load balancer is distributing. (The first request after the IP goes live may take a minute or two to propagate across the edge; a 404/502 immediately after creation is normal — retry.)

Cleanup — delete in reverse order of creation (front to back), or the dependencies block deletion:

gcloud compute forwarding-rules delete web-fr --global -q
gcloud compute target-http-proxies delete web-proxy -q
gcloud compute url-maps delete web-map -q
gcloud compute backend-services delete web-bes --global -q
gcloud compute health-checks delete web-hc --global -q
gcloud compute addresses delete web-ip --global -q
gcloud compute firewall-rules delete allow-lb-health -q
gcloud compute instance-groups managed delete web-mig --region=us-central1 -q
gcloud compute instance-templates delete web-tmpl -q

Cost note: the global external ALB has a small hourly charge for the forwarding rule plus a per-GB data-processing charge, and the two e2-small VMs cost a few cents per hour. Running this lab for an hour costs well under a dollar and fits comfortably inside the $300 free credit — but the forwarding rule and the VMs bill while they exist, so do the cleanup. A reserved global IP that is not attached to a forwarding rule also incurs a small charge, which the cleanup releases.

Common mistakes & troubleshooting

Symptom Likely cause Fix
All backends report UNHEALTHY, app works when you SSH in Firewall does not allow the health-check ranges Add ingress allow for 130.211.0.0/22 and 35.191.0.0/16 (plus 209.85.152.0/22, 209.85.204.0/22 for passthrough NLB) to the backend port
Accidentally built a regional LB; no global anycast A regional forwarding rule / backend service slipped into the chain Keep --global consistent across every resource; recreate the mismatched ones globally
Backend logs show Google IPs, not real client IPs It is a proxy LB (ALB / proxy NLB) — client IP is hidden Read X-Forwarded-For (ALB) or enable PROXY protocol (proxy NLB); or use a passthrough NLB if you must have the raw source IP
502 Bad Gateway from a healthy-looking app Backend timeout exceeded, or app closed the keepalive before the LB’s timeout Tune the backend-service --timeout; ensure the app’s keepalive ≥ the LB’s
HTTPS certificate “PROVISIONING” forever (managed cert) DNS for the domain does not yet point at the LB IP Point the A/AAAA record at the forwarding-rule IP; Google-managed certs validate via DNS and only go ACTIVE once it resolves
Sticky sessions sometimes break Affinity is best-effort; backend went unhealthy / over capacity / set changed Treat affinity as an optimisation; store session state in Memorystore or a DB
Serverless backend won’t attach / asks for a health check Wrong NEG type, or expecting health checks on serverless Use --network-endpoint-type=serverless; serverless backends need no health check
Traffic not overflowing to a second region on overload No capacity ceiling set, so the LB never considers a region “full” Set --max-utilization / --max-rate* so saturation triggers graceful overflow

Best practices

Security notes

Interview & exam questions

  1. What are the two questions that determine which Google Cloud load balancer to use? (a) Traffic type — Application/L7 (HTTP/S/gRPC, proxy) vs Network/L4 (TCP/UDP, passthrough or proxy); (b) scope/exposure — global vs regional, external vs internal. Those two axes uniquely identify the product.
  2. Explain the difference between a proxy and a passthrough load balancer, and why it matters. A proxy LB terminates the client connection and opens a new one to the backend, so the backend sees Google’s IP (real client in X-Forwarded-For or via PROXY protocol) and the LB can do TLS termination, L7 routing, CDN and WAF. A passthrough LB forwards packets without terminating, so the backend sees the original client IP and replies directly — lowest overhead, no L7 features. It matters for client-IP logging, TLS handling, and which features are available.
  3. Put the resource chain of an Application Load Balancer in order. Forwarding rule → target (HTTP/S) proxy → URL map → backend service → backend (instance group or NEG); the health check attaches to the backend service. A passthrough NLB omits the proxy and URL map.
  4. A new global ALB shows all backends UNHEALTHY but the app responds over SSH. Why? The VPC firewall is not allowing the health-check/proxy source ranges 130.211.0.0/22 and 35.191.0.0/16 to the backend port. Add an ingress allow rule for them.
  5. When would you choose a regional external ALB over the global one? When the audience is in one region, when you need Standard Network Tier to cut egress cost, or when data-residency rules require traffic to stay in a region — at the cost of losing the global anycast front door.
  6. You need to load-balance a UDP game server and the backend must see the real player IP. Which LB? An external passthrough Network LB — L4, connectionless, preserves source IP, supports UDP. An ALB or proxy NLB would hide the client IP and not handle raw UDP.
  7. What is the only load balancer that can be a next hop in a route, and why does that matter? The internal passthrough Network LB. It enables steering traffic through network virtual appliances (firewalls/IDS), the basis of hub-and-spoke inspection architectures.
  8. How do you put a custom domain, Cloud CDN and Cloud Armor in front of a Cloud Run service? Create a serverless NEG pointing at the Cloud Run service, attach it to a backend service on a (global) external ALB, and route to it from the URL map. Serverless NEGs need no health check and are regional, so add one per region for global serving.
  9. What does the balancing mode do, and name the three modes. It defines how requests are assigned and when a backend is “full” (triggering overflow). Modes: UTILIZATION (CPU), RATE (requests/sec), CONNECTION (concurrent connections). Pair with --max-* ceilings and --capacity-scaler.
  10. Why is session affinity not a substitute for external session storage? Affinity is best-effort and can break when a backend becomes unhealthy, exceeds capacity, or the backend set changes — so per-user state must live in a shared store (Memorystore/DB); affinity is only an optimisation.
  11. Which load balancers can use Cloud Armor, and where does the policy attach? External Application LBs and external proxy Network LBs (there must be an edge proxy to enforce it); the security policy attaches to the backend service, so it is per-backend like Cloud CDN.
  12. What network tier do global load balancers require, and why? Premium Tier — global anycast and edge serving ride Google’s premium backbone; Standard Tier only supports regional load balancing.

Quick check

  1. Which two products are proxy Network Load Balancers, and what do they do that a passthrough NLB cannot?
  2. In the ALB resource chain, which resource owns the health check and the Cloud Armor policy?
  3. Your backend service has backends in two regions but never overflows when one is overloaded. What did you forget to configure?
  4. What NEG type fronts an on-prem origin behind GCP’s CDN and Cloud Armor?
  5. True or false: a Google-managed SSL certificate becomes ACTIVE before you point DNS at the load balancer IP.

Answers

  1. The external and internal proxy Network LBs. They terminate the TCP/SSL connection (enabling TLS offload and, for the external one, a global anycast TCP front end), whereas a passthrough NLB never terminates and so preserves the client IP but offers no offload or L7 features.
  2. The backend service owns both the health check and the Cloud Armor security policy (as well as balancing mode, session affinity, timeouts, and Cloud CDN).
  3. A balancing-mode capacity ceiling (--max-utilization, --max-rate*, or --max-connections*). Without a ceiling the LB never marks a region “full”, so it never overflows to the other region.
  4. An internet NEG (INTERNET_FQDN_PORT or INTERNET_IP_PORT) attached to a global external ALB.
  5. False. A Google-managed cert stays in PROVISIONING until the domain’s DNS resolves to the forwarding-rule IP; only then does it validate and go ACTIVE.

Exercise

Take a two-tier application: a public web front end and a private internal API the front end calls. Using the decision table, write down (a) which load balancer fronts the public web tier and why, including the network tier; (b) which load balancer the front end uses to reach the internal API and why; © the full resource chain you would create for the public LB; (d) where you would attach Cloud Armor and one rule you would add; and (e) if the API were re-platformed onto Cloud Run, exactly what changes in the internal LB’s backend (name the NEG type). Then sketch the gcloud commands for part © from memory and check them against the lab above.

Certification mapping

Glossary

Next steps

You can now name and assemble any Google Cloud load balancer and know which one each workload needs. To turn the flagship into a production front end — every forwarding-rule, URL-map, balancing-mode, hybrid-NEG, Cloud CDN, Cloud Armor and mTLS knob, wired end to end — read Engineering the Global External Application Load Balancer on GCP (gcp-global-external-application-load-balancer-deep-dive). After that, the course moves into containers with Google Kubernetes Engine, In Depth: Autopilot vs Standard, Node Pools, Networking & Security (gke-deep-dive-autopilot-standard-node-pools-networking), where the GKE Gateway and Ingress controllers provision the very load balancers you have just learned, driven by Kubernetes manifests.

GCPLoad BalancingCloud ArmorNetworkingApplication Load BalancerNetwork Load Balancer
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments