Cloud Functions 2nd gen is not a bigger version of the old runtime. It is Cloud Run with a function-shaped front door and Eventarc wired to the back. Once you internalize that, the whole platform stops being magic: your function is a container, it scales like Cloud Run, it bills like Cloud Run, and every event that reaches it arrives as a CloudEvent delivered over an Eventarc trigger. This guide builds the operational mental model for designing event-driven systems on that stack – how events route and filter, how to tune concurrency and scaling, and how to make handlers survive retries without corrupting state.
Everything below uses the current gcloud functions (Gen2) and gcloud eventarc surfaces. Where a default bites you, I call it out.
1. What actually changed from 1st gen
1st gen functions ran on a Google-managed, function-specific platform with their own event plumbing. 2nd gen functions are deployed as Cloud Run services and triggered through Eventarc. That single architectural decision drives every meaningful difference:
| Concern | 1st gen | 2nd gen |
|---|---|---|
| Underlying compute | Proprietary functions runtime | Cloud Run service (a real revision you can inspect) |
| Event delivery | Built-in, per-source plumbing | Eventarc + CloudEvents |
| Concurrency | 1 request per instance, always | Up to 1000 requests per instance |
| Request timeout | 540s max | Up to 3600s (HTTP-triggered) |
| Instance size | Capped (up to 8 GiB / 4 vCPU) | Larger CPU/memory ceilings via Cloud Run |
| Traffic splitting | No | Yes, via Cloud Run revisions |
| Min instances | Limited | First-class, set per function |
The practical consequence: a 2nd gen function shows up in the Cloud Run console as a service, you can gcloud run services describe it, and the same concurrency, min-instances, and CPU levers apply. You still deploy with gcloud functions deploy --gen2, which generates the source build, the Cloud Run service, and (for event triggers) the Eventarc trigger as one unit.
A minimal HTTP function for context:
gcloud functions deploy http-echo \
--gen2 \
--runtime=nodejs20 \
--region=us-central1 \
--source=. \
--entry-point=echo \
--trigger-http \
--no-allow-unauthenticated
2. Eventarc architecture: providers, triggers, CloudEvents
Eventarc is the routing layer. Three concepts:
- Provider – the system that emits events (Cloud Storage, Pub/Sub, Firestore, or any service that writes Cloud Audit Logs).
- Event type – a specific thing that happened, identified by a
typestring such asgoogle.cloud.storage.object.v1.finalized. - Trigger – the binding that says “events of this type, matching these filters, go to this destination,” with an associated service account.
Every event Eventarc delivers is a CloudEvent (the CNCF spec). Direct events arrive in structured or binary content mode; your function receives a typed CloudEvent object. The attributes you will reference constantly:
| Attribute | Meaning |
|---|---|
type |
The event type (drives filtering) |
source |
The emitting resource |
subject |
The specific object affected (e.g. objects/path/to/file.csv) |
id |
Unique event id – your idempotency key |
time |
Event timestamp |
data |
The payload (object metadata, Pub/Sub message, Firestore document) |
A Node.js handler using the Functions Framework’s CloudEvent signature:
const functions = require('@google-cloud/functions-framework');
functions.cloudEvent('handleObject', (cloudEvent) => {
const { id, type, subject } = cloudEvent;
const file = cloudEvent.data; // storage object metadata
console.log(JSON.stringify({
severity: 'INFO',
message: 'received event',
eventId: id,
eventType: type,
bucket: file.bucket,
object: file.name,
}));
});
3. Direct events vs Audit Log events, and trigger filtering
Eventarc delivers events two ways, and the distinction governs latency, cost, and which filters are legal.
Direct events come straight from sources that natively emit to Eventarc – Cloud Storage object events, Pub/Sub messages, Firestore document changes. They are low-latency and the supported path for those sources. You filter on type plus source-specific attributes (e.g. the bucket).
Cloud Audit Log events let you trigger on almost any Google Cloud API write, by matching the audit log entry. This is the catch-all when a service has no direct event: you filter on serviceName, methodName, and optionally resourceName. The cost is latency (audit logs are written then routed) and you must have Admin Activity or Data Access audit logs enabled for that service. Data Access logs are off by default for most services.
Filtering supports exact match and, for resourceName, a path-pattern operator. Create a Cloud Storage direct trigger:
gcloud eventarc triggers create gcs-finalize-trigger \
--location=us-central1 \
--destination-run-service=handle-object \
--destination-run-region=us-central1 \
--event-filters="type=google.cloud.storage.object.v1.finalized" \
--event-filters="bucket=acme-prod-ingest" \
--service-account=eventarc-invoker@acme-prod.iam.gserviceaccount.com
An Audit Log trigger – fire when anyone sets an IAM policy on a bucket:
gcloud eventarc triggers create iam-setpolicy-audit \
--location=us-central1 \
--destination-run-service=audit-handler \
--destination-run-region=us-central1 \
--event-filters="type=google.cloud.audit.log.v1.written" \
--event-filters="serviceName=storage.googleapis.com" \
--event-filters="methodName=storage.setIamPermissions" \
--service-account=eventarc-invoker@acme-prod.iam.gserviceaccount.com
Audit Log triggers are powerful but the wrong default. If a direct event exists for your source, use it – it is faster, cheaper, and does not depend on audit log configuration that another team can change out from under you. Reserve Audit Log triggers for control-plane reactions (someone changed a firewall, someone created a service account key) where no direct event is available.
When deploying a 2nd gen function, the trigger is created for you. The equivalent of the GCS trigger above, expressed as a function deploy:
gcloud functions deploy handle-object \
--gen2 \
--runtime=nodejs20 \
--region=us-central1 \
--source=. \
--entry-point=handleObject \
--trigger-event-filters="type=google.cloud.storage.object.v1.finalized" \
--trigger-event-filters="bucket=acme-prod-ingest" \
--trigger-service-account=eventarc-invoker@acme-prod.iam.gserviceaccount.com
4. Cloud Storage, Pub/Sub, and Firestore triggers in practice
Cloud Storage. The event types you will use: object.v1.finalized (created or overwritten), object.v1.deleted, object.v1.archived, object.v1.metadataUpdated. A subtlety that causes duplicate processing: overwriting an object emits finalized again. Treat finalized as “an object version exists now,” not “a new file was uploaded.” The bucket must be in the same region (or a compatible location) as the trigger, and the bucket’s Pub/Sub publishing requires the Cloud Storage service agent to have the pubsub.publisher role – Eventarc wires this on first use, but in tight org-policy environments you grant it explicitly.
Pub/Sub. A Pub/Sub trigger is the most flexible primitive: any system that can publish a message can drive your function. Eventarc creates (or reuses) a subscription behind the trigger.
gcloud functions deploy process-message \
--gen2 --runtime=nodejs20 --region=us-central1 \
--source=. --entry-point=processMessage \
--trigger-topic=orders-events
The message body is base64-encoded under cloudEvent.data.message.data:
functions.cloudEvent('processMessage', (cloudEvent) => {
const msg = cloudEvent.data.message;
const payload = msg.data
? Buffer.from(msg.data, 'base64').toString()
: '';
const order = JSON.parse(payload);
// ... handle order, using msg.messageId or cloudEvent.id for idempotency
});
Firestore. Trigger on document writes with a document-path pattern. Event types: document.v1.created, updated, deleted, written (any of the three). The path supports wildcards: a single-segment {userId} or a multi-segment {path=**}.
gcloud functions deploy on-order-write \
--gen2 --runtime=nodejs20 --region=us-central1 \
--source=. --entry-point=onOrderWrite \
--trigger-event-filters="type=google.cloud.firestore.document.v1.written" \
--trigger-event-filters="database=(default)" \
--trigger-event-filters-path-pattern="document=customers/{customerId}/orders/{orderId}"
Note --trigger-event-filters-path-pattern for the wildcarded path versus plain --trigger-event-filters for exact matches.
5. Concurrency, min instances, and per-function scaling
Because a 2nd gen function is a Cloud Run service, the scaling story is the Cloud Run scaling story. The single biggest behavioral difference from 1st gen: concurrency can exceed 1. With concurrency 1 (the 1st gen default), every concurrent event spins a new instance. Raising concurrency lets one instance handle many events at once – dramatically cheaper for I/O-bound handlers, dangerous for ones that hold scarce resources.
gcloud functions deploy process-message \
--gen2 --runtime=nodejs20 --region=us-central1 \
--source=. --entry-point=processMessage \
--trigger-topic=orders-events \
--concurrency=20 \
--cpu=1 --memory=512Mi \
--min-instances=1 \
--max-instances=50
Decision guidance:
--concurrency– set above 1 only when your handler is concurrency-safe and mostly waiting on network. Each in-flight event shares the instance’s CPU and memory; size memory for the peak of concurrent handlers, not one.--min-instances– the antidote to cold starts. For latency-sensitive event paths, keep 1+ warm. You pay for idle instances, so reserve this for functions where cold-start tail latency matters.--max-instances– a backpressure valve. A function that writes to Cloud SQL must not be allowed to open thousands of connections; cap max-instances (times concurrency) below your connection budget.--cpu/--memory– raising concurrency without raising these starves handlers. The productmax-instances x concurrencyis your true peak parallelism against downstream systems.
The classic 2nd gen outage: team raises concurrency to 80 to save money, the function talks to a database with a 100-connection pool, max-instances is 50, and a traffic spike opens far more connections than the pool allows. The fix is arithmetic, not heroics: bound
max-instances x concurrencyunder the downstream limit, or front the database with a connection pooler.
6. Retry, dead-lettering, and idempotency
Event-triggered functions can be configured to retry on failure. With retries enabled, a handler that throws (or returns a non-2xx) is redelivered. Without it, the event is dropped on first failure. Enable retries only with idempotent handlers, because at-least-once delivery means duplicates are normal, not exceptional.
gcloud functions deploy handle-object \
--gen2 --runtime=nodejs20 --region=us-central1 \
--source=. --entry-point=handleObject \
--trigger-event-filters="type=google.cloud.storage.object.v1.finalized" \
--trigger-event-filters="bucket=acme-prod-ingest" \
--retry
Idempotency. Use cloudEvent.id (stable across redeliveries of the same event) as a dedup key. Record processed ids in Firestore or another store and short-circuit duplicates:
const { Firestore } = require('@google-cloud/firestore');
const db = new Firestore();
functions.cloudEvent('handleObject', async (cloudEvent) => {
const ref = db.collection('processed_events').doc(cloudEvent.id);
const created = await db.runTransaction(async (tx) => {
const snap = await tx.get(ref);
if (snap.exists) return false; // already handled
tx.set(ref, { at: Date.now() });
return true;
});
if (!created) {
console.log(`duplicate ${cloudEvent.id}, skipping`);
return;
}
// ... do the real, side-effecting work exactly once
});
Dead-lettering. Cloud Functions retries do not bound themselves by default – a permanently poisoned event can retry indefinitely (capped by the event’s max retention window). The robust pattern is to put Pub/Sub between the source and your function and attach a dead-letter topic with --max-delivery-attempts. Then failures land in a DLQ you can inspect and replay instead of looping forever.
# Trigger subscription with a dead-letter topic and bounded attempts
gcloud pubsub subscriptions create orders-events-sub \
--topic=orders-events \
--dead-letter-topic=orders-events-dlq \
--max-delivery-attempts=5 \
--min-retry-delay=10s --max-retry-delay=600s
For direct Cloud Storage or Firestore triggers where you cannot interpose Pub/Sub easily, enforce a poison-pill guard in code: read the delivery attempt header / count, and after N tries write the event to a DLQ topic yourself and return success so Eventarc stops retrying.
7. Securing functions: ingress, IAM invokers, VPC egress
Because the function is a Cloud Run service, you secure it like one.
Ingress. Lock down who can reach the URL. internal-only restricts to VPC and internal sources; internal-and-gclb adds traffic fronted by an external HTTPS load balancer (so you can put Cloud Armor in front).
gcloud functions deploy handle-object --gen2 --region=us-central1 \
--source=. --entry-point=handleObject \
--ingress-settings=internal-only \
...
IAM invoker. Eventarc delivers events by invoking the underlying Cloud Run service, so the trigger’s service account needs roles/run.invoker on it. Grant least privilege – a dedicated invoker SA per trigger, not the default compute SA:
gcloud run services add-iam-policy-binding handle-object \
--region=us-central1 \
--member="serviceAccount:eventarc-invoker@acme-prod.iam.gserviceaccount.com" \
--role="roles/run.invoker"
For Audit Log and other Eventarc paths, the trigger SA also needs roles/eventarc.eventReceiver.
VPC egress. To reach private resources (Cloud SQL private IP, an internal API, on-prem over Interconnect), attach the function to a VPC. Direct VPC egress is the modern path; route all outbound through it so nothing escapes to the public internet:
gcloud functions deploy handle-object --gen2 --region=us-central1 \
--source=. --entry-point=handleObject \
--network=projects/acme-prod/global/networks/prod-vpc \
--subnet=projects/acme-prod/regions/us-central1/subnetworks/run-egress \
--vpc-connector-egress-settings=all-traffic \
...
If your function suddenly cannot reach a private database after you “secured” it, check egress settings first.
private-ranges-onlysends only RFC 1918 traffic through the VPC;all-traffic(a.k.a. all-egress) forces everything through it. Mismatched egress is the most common 2nd gen connectivity failure.
8. Observability: structured logs, traces, error reporting
Logs from 2nd gen functions land in Cloud Logging under the Cloud Run service resource. Emit structured JSON to stdout/stderr so the severity and your custom fields become first-class log fields. The handler in section 2 already does this; the payoff is queryability.
Find every event a function failed to process, in Logs Explorer (Logging Query Language):
resource.type="cloud_run_revision"
resource.labels.service_name="handle-object"
severity>=ERROR
Correlate a single event end-to-end by its CloudEvent id:
resource.type="cloud_run_revision"
jsonPayload.eventId="1234567890-abcdef"
Traces. Cloud Run / Functions 2nd gen integrates with Cloud Trace; instrument with OpenTelemetry and propagate context to downstream calls so a slow event handler shows its database span. Error Reporting automatically groups stack traces from your logs – emit exceptions with a stack to stderr and they aggregate into trackable issues with notifications, which is how you catch a poison-pill loop before it burns your retry budget.
Watch these signals in particular: Cloud Run instance count (scaling against your downstream limits), request latency p99 (cold starts and slow handlers), and Pub/Sub dead-letter topic depth (your poison-pill detector).
Verify
Confirm the wiring end-to-end before declaring victory.
# 1. The function deployed as a Gen2 Cloud Run service
gcloud functions describe handle-object --gen2 --region=us-central1 \
--format="value(state, serviceConfig.uri)"
# 2. The Eventarc trigger exists and points at the service
gcloud eventarc triggers describe handle-object \
--location=us-central1 \
--format="yaml(eventFilters, destination, serviceAccount)"
# 3. The trigger SA can invoke the underlying service
gcloud run services get-iam-policy handle-object --region=us-central1
# 4. Drive a real event and watch it land
echo "verify-$(date +%s)" > /tmp/probe.txt
gcloud storage cp /tmp/probe.txt gs://acme-prod-ingest/probe.txt
# 5. Confirm processing in logs (look for your eventId)
gcloud functions logs read handle-object --gen2 --region=us-central1 --limit=20
A healthy result: step 1 shows ACTIVE, step 2 shows your filters and the invoker SA, step 3 lists that SA with roles/run.invoker, and step 5 shows a log line with the bucket/object you just wrote. If the upload succeeds but no log appears, the trigger SA almost always lacks invoker permission, or an Audit Log trigger is waiting on logs that are not enabled.
Enterprise scenario
A payments platform team ingested settlement files via a Cloud Storage finalized trigger that parsed each file and posted ledger entries to a downstream API. It worked in staging and fell over in production the first month-end. Two failures compounded:
- Their batch system re-uploaded a handful of files after a transient failure. Each overwrite emitted another
finalizedevent, and because the handler was not idempotent, those settlements were posted to the ledger twice – a reconciliation incident, not just a bug. - They had enabled
--retryfor resilience. When the downstream ledger API was briefly overloaded, handlers threw, events retried, instances multiplied, and the retry storm kept the ledger API pinned – the retries became the outage.
The constraint: ledger posts had to be exactly-once against a partner API with a hard rate limit, and the team could not modify the upstream batch system that re-uploaded files.
The fix had three moves. First, idempotency keyed on cloudEvent.id plus the object generation, recorded in a Firestore processed_events collection inside a transaction (the section 6 pattern), so a re-uploaded file’s duplicate event short-circuited. Second, they stopped triggering the parser directly and put Pub/Sub with a dead-letter topic in the path, bounding --max-delivery-attempts=5 so a poisoned file landed in a DLQ for a human instead of retrying forever. Third, they bounded max-instances x concurrency under the partner’s rate limit so the function could never out-pace the API.
# Bound parallelism against the partner rate limit, and bound retries via DLQ
gcloud functions deploy settlement-parser \
--gen2 --runtime=nodejs20 --region=us-central1 \
--source=. --entry-point=parseSettlement \
--trigger-topic=settlement-files \
--concurrency=4 --max-instances=10 \
--cpu=1 --memory=512Mi
gcloud pubsub subscriptions update settlement-files-sub \
--dead-letter-topic=settlement-files-dlq \
--max-delivery-attempts=5
Net effect: 40 instances x 4 = 160 max in-flight, comfortably under the partner limit; duplicates de-duped at the door; poison files quarantined in a DLQ with an alert on its depth. Month-end since has been quiet. The lesson is the one this whole platform rewards: at-least-once delivery plus retries is a correctness contract, not a convenience – design the handler for duplicates and bound the blast radius, or the resilience features become the incident.