Pub/Sub Delivery Guarantees: Exactly-Once, Ordering Keys, Dead-Letter, and Flow Control

Pub/Sub is easy to start with and easy to get wrong. The defaults give you a horizontally scalable, at-least-once bus that will happily redeliver messages, reorder them across partitions, and silently retry a poison message forever while your subscriber CPU melts. Every one of those behaviors is configurable, but the configuration is subtle: exactly-once is region-scoped and pull-only, ordering keys cap your throughput, dead-letter topics need IAM you have to grant by hand, and flow control lives in the subscriber client rather than the subscription resource.

This is a working guide to wiring all of it correctly. Commands are gcloud and Python client library. Replace PROJECT_ID and PROJECT_NUMBER with your own throughout.

1. Delivery semantics: at-least-once vs exactly-once

Pub/Sub’s default is at-least-once delivery. A message is delivered to a subscriber at least once; under normal operation usually exactly once, but duplicates are expected and legal. Duplicates arise from three sources you cannot fully eliminate at the at-least-once tier:

Ack deadline expiry. The subscriber held the message past its ack deadline (network blip, slow handler, GC pause), so Pub/Sub redelivers.
Lost acks. The subscriber acked, but the ack didn’t reach the service in time, so the message is redelivered.
Publisher retries. A publish RPC times out, the client retries, and the same logical event lands twice with different message IDs.

The correct baseline posture is idempotent consumers. Design every handler so that processing the same business event twice is a no-op: dedupe on a stable business key, use conditional writes, or fold into idempotent upserts. If your consumer is idempotent, at-least-once is almost always the right and cheapest choice.

Exactly-once delivery is a stronger, opt-in guarantee that is frequently misunderstood. It does not mean a message is processed exactly once across your whole system; it means that within a single subscription, once a message is successfully acknowledged, it will not be redelivered, and while a message is outstanding (lease not expired) it will not be redelivered to another subscriber. It removes the ack-deadline and lost-ack duplicate classes, not publisher-side duplicates. It costs more, has higher latency, and lower throughput. Reach for it only when idempotency is genuinely impractical.

Rule of thumb: make consumers idempotent first. Add exactly-once only for the small set of subscriptions where dedupe is impossible (e.g. non-idempotent financial side effects that can’t carry a business key).

2. Enabling exactly-once delivery and idempotent ack handling

Exactly-once is a subscription property. Two hard constraints: it is pull-only (not supported on push subscriptions, because the push receiver can’t confirm the service received its response), and it only holds when subscribers connect in a single region. From outside Google Cloud, use a locational endpoint (for example us-east1-pubsub.googleapis.com:443) rather than the global one so all subscriber connections pin to one region.

gcloud pubsub subscriptions create orders-eo-sub \
  --topic=orders \
  --enable-exactly-once-delivery \
  --ack-deadline=60 \
  --message-retention-duration=7d

A 60s ack deadline is the recommended default for exactly-once subscriptions: longer deadlines absorb transient network events that would otherwise cause redelivery. The deadline range is 10 to 600 seconds.

The behavioral change you must code for is on the ack side. With exactly-once, an ack/nack/modAck returns a status the client can observe, and only the most recent ack ID for a message is valid — an ack ID expires when the deadline passes or when the lease is extended, and a stale ack ID returns INVALID_ARGUMENT. The client libraries surface this through a future on the ack call. You must wait for the ack to confirm before treating the message as durably done:

from concurrent.futures import TimeoutError
from google.cloud import pubsub_v1

subscriber = pubsub_v1.SubscriberClient()
sub_path = subscriber.subscription_path("PROJECT_ID", "orders-eo-sub")

def callback(message: pubsub_v1.subscriber.message.Message) -> None:
    try:
        process(message.data)  # your idempotent-ish side effect
    except Exception:
        # nack: let the retry policy decide redelivery timing
        nack_future = message.nack_with_response()
        nack_future.result()
        return

    # With exactly-once, ack() returns a future. Only treat the message
    # as done once the service confirms the ack succeeded.
    ack_future = message.ack_with_response()
    try:
        ack_future.result()  # raises if the ack was not accepted
    except Exception:
        # Ack failed (e.g. lease expired). The message WILL be redelivered;
        # do not commit any "already processed" marker here.
        return

flow = pubsub_v1.types.FlowControl(max_messages=100, max_bytes=50 * 1024 * 1024)
future = subscriber.subscribe(sub_path, callback=callback, flow_control=flow)
try:
    future.result()
except TimeoutError:
    future.cancel()
    future.result()

The critical discipline: do not record “I processed this” until the ack future resolves successfully. If the ack fails, the message is coming back, and your dedupe state must reflect that.

3. Ordering keys: guarantees, trade-offs, and resume-on-failure

Pub/Sub does not order messages globally. With ordering keys, messages that share the same key, published to the same region, are delivered to a given subscription in publish order. Messages with an empty ordering key are not ordered. Enable ordering on the subscription:

gcloud pubsub subscriptions create accounts-ordered-sub \
  --topic=accounts \
  --enable-message-ordering \
  --ack-deadline=30

On the publisher, you must set enable_message_ordering=True and stamp each message with an ordering key (an account ID, an aggregate ID — never a high-cardinality random value):

from google.cloud import pubsub_v1

publisher = pubsub_v1.PublisherClient(
    publisher_options=pubsub_v1.types.PublisherOptions(enable_message_ordering=True)
)
topic_path = publisher.topic_path("PROJECT_ID", "accounts")

key = "acct-42"
future = publisher.publish(topic_path, b'{"event":"debit"}', ordering_key=key)
future.result()  # block to preserve order; a failure here matters (see below)

Two trade-offs you must design around:

Throughput cap. Publishing throughput per ordering key is limited to 1 MB/s. Ordering serializes a key, so a hot key is a hard bottleneck. Choose keys that spread load: per-customer, per-device, per-aggregate — not a single global key.
Redelivery cascades. If a message for a key is redelivered, all subsequent messages for that key are redelivered too, including already-acked ones, to preserve order. A single slow or failing message stalls its entire key, and unacked messages for one key can delay other keys during server restarts or rebalancing.

Resume-on-failure is the publisher-side gotcha. If a publish for an ordering key fails, the client library pauses all further publishes for that key and fails them until you explicitly resume. After handling the failure, call resume_publish for that key, otherwise that key is stuck:

key = "acct-42"
future = publisher.publish(topic_path, b'{"event":"credit"}', ordering_key=key)
try:
    future.result()
except Exception:
    # All subsequent publishes for this key are now rejected until resumed.
    publisher.resume_publish(topic_path, key)

You can combine ordering with exactly-once (--enable-message-ordering --enable-exactly-once-delivery); the subscriber must then ack in order.

4. Retry policies, exponential backoff, and redelivery

By default, when an ack deadline expires or a subscriber nacks, Pub/Sub redelivers immediately. A handler failing on a transient downstream dependency will hot-loop, hammering both the dependency and your error budget. Attach an exponential backoff retry policy so redelivery spreads out:

gcloud pubsub subscriptions update orders-eo-sub \
  --min-retry-delay=10s \
  --max-retry-delay=300s

Both bounds range from 10 to 600 seconds. Pub/Sub starts near the minimum and grows the delay toward the maximum on repeated failures for the same message. Note the interaction with ordering: backoff on a key holds up that key’s later messages, which is usually what you want (don’t skip ahead past a failed event) but is worth stating in your design docs.

To revert to immediate retry, clear the policy:

gcloud pubsub subscriptions update orders-eo-sub --clear-retry-policy

5. Dead-letter topics: configuration, IAM, and reprocessing

A retry policy delays poison messages but never stops them. A dead-letter topic caps delivery attempts and offloads the failures so the main subscription keeps flowing. Create a dedicated DLT plus a subscription on it (so messages are retained and inspectable), then attach the policy:

# 1. Dead-letter topic and a subscription to hold failures
gcloud pubsub topics create orders-dlq
gcloud pubsub subscriptions create orders-dlq-sub --topic=orders-dlq \
  --message-retention-duration=7d

# 2. Attach the dead-letter policy to the live subscription
gcloud pubsub subscriptions update orders-eo-sub \
  --dead-letter-topic=orders-dlq \
  --max-delivery-attempts=10

--max-delivery-attempts accepts 5 to 100 (default 5). It is approximate — Pub/Sub forwards on a best-effort basis — so don’t treat it as an exact counter.

The IAM step everyone forgets. Forwarding to the DLT and acking the original message are performed by the Pub/Sub service agent, not your identity. That agent needs two grants, and if you skip them the policy silently fails to forward. The service agent is service-PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com:

PUBSUB_SA="service-PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com"

# Publish forwarded messages into the dead-letter topic
gcloud pubsub topics add-iam-policy-binding orders-dlq \
  --member="serviceAccount:${PUBSUB_SA}" \
  --role="roles/pubsub.publisher"

# Acknowledge the undeliverable message on the source subscription
gcloud pubsub subscriptions add-iam-policy-binding orders-eo-sub \
  --member="serviceAccount:${PUBSUB_SA}" \
  --role="roles/pubsub.subscriber"

Reprocessing pattern. Don’t point a consumer directly at the DLT in a loop — you’ll recreate the hot-loop. Treat the DLQ as a quarantine: alert on it, triage, fix the bug or bad data, then replay. A clean replay path is a small job that pulls from orders-dlq-sub and republishes to the original orders topic once the root cause is resolved. Pub/Sub stamps delivery_attempt on dead-lettered messages, so your triage tooling can read it directly off the message attributes.

6. Subscriber flow control and outstanding-message tuning

Flow control is client-side, not a subscription property. StreamingPull will deliver as fast as it can; without limits, a subscriber pulls thousands of outstanding messages, blows its memory, and starts missing ack deadlines (which, with exactly-once or ordering, triggers exactly the redelivery storm you were trying to avoid). You bound concurrency with FlowControl:

from google.cloud import pubsub_v1

flow = pubsub_v1.types.FlowControl(
    max_messages=200,               # max outstanding (unacked) messages
    max_bytes=200 * 1024 * 1024,    # max outstanding bytes (200 MiB)
    max_lease_duration=600,         # cap total time the client extends a lease (s)
)
future = subscriber.subscribe(sub_path, callback=callback, flow_control=flow)

Tuning guidance:

Size max_messages to roughly handler_throughput_per_sec * p99_handler_latency_sec, then bound max_bytes to a safe fraction of container memory. Whichever limit is hit first pauses delivery.
The client auto-extends leases up to max_lease_duration. If a handler legitimately runs long, raise this so the lease isn’t lost mid-processing — but with a ceiling, so a wedged handler doesn’t pin a message forever.
StreamingPull pauses cleanly under backpressure: when flow-control limits are reached the server stops sending without breaking the connection, and resumes when capacity frees up. This is why StreamingPull, not unary Pull, is the default for throughput-sensitive workloads.

Scale horizontally for throughput (more subscriber instances on the same subscription), and use flow control to keep each instance stable.

7. Push vs pull vs StreamingPull and managed export subscriptions

Pick the delivery mechanism to match the consumer:

Mechanism	When to use	Constraints
StreamingPull	High-throughput, low-latency, long-lived consumers (default)	Client-managed flow control; bidirectional stream
Unary Pull	Batch/cron consumers, simple control over fetch cadence	One response per request; higher latency at volume
Push	Webhook-style HTTP endpoints, Cloud Run/Functions	No exactly-once; ack via HTTP 2xx; service controls rate
BigQuery subscription	Stream straight into a BigQuery table	No subscriber code; schema must match
Cloud Storage subscription	Land batches as files in GCS	No subscriber code; batched by size/time

For sink-style ingestion, prefer the managed export subscriptions over hand-rolled consumers. A BigQuery subscription writes messages directly to a table with no subscriber to operate:

gcloud pubsub subscriptions create events-to-bq \
  --topic=events \
  --bigquery-table=PROJECT_ID:analytics.events \
  --use-topic-schema \
  --write-metadata

A Cloud Storage subscription batches messages to objects, flushing on a size or duration threshold:

gcloud pubsub subscriptions create events-to-gcs \
  --topic=events \
  --cloud-storage-bucket=my-events-bucket \
  --cloud-storage-file-prefix=events/ \
  --cloud-storage-max-duration=300s \
  --cloud-storage-max-bytes=100MB

Note: exactly-once and ordering are pull-tier guarantees. Push and the managed export subscriptions are at-least-once, so the destination must tolerate duplicates (dedupe in BigQuery on a message key; idempotent object naming in GCS).

8. Monitoring backlog, oldest unacked age, and expired acks

You operate Pub/Sub by watching a few subscription metrics in Cloud Monitoring. The three that matter most:

subscription/num_undelivered_messages — backlog size. A rising, non-draining backlog means consumers can’t keep up: scale out or speed up the handler.
subscription/oldest_unacked_message_age — age of the oldest unacked message, in seconds. This is your true freshness SLO. If it climbs toward your message-retention-duration, you are about to lose data.
subscription/expired_ack_deadlines_count — acks that missed their deadline. Sustained nonzero values mean handlers are too slow for the ack deadline, or flow control is letting too many messages outstanding. This directly causes redelivery (and, under ordering/exactly-once, cascades).

Watch dead-lettered volume via subscription/dead_letter_message_count, and on the publisher side keep an eye on topic/send_request_count error ratios.

A practical alerting policy in MQL — page when the oldest unacked message exceeds 10 minutes:

fetch pubsub_subscription
| metric 'pubsub.googleapis.com/subscription/oldest_unacked_message_age'
| filter (resource.subscription_id == 'orders-eo-sub')
| group_by 1m, [value_age_max: max(value.oldest_unacked_message_age)]
| condition value_age_max > 600 's'

Quick CLI sanity check on backlog and DLT depth during an incident:

gcloud pubsub subscriptions describe orders-eo-sub \
  --format="yaml(ackDeadlineSeconds, retryPolicy, deadLetterPolicy)"

Enterprise scenario

A payments platform team ran an orders topic feeding a ledger-posting service. Their first design used a single global ordering key to “guarantee” strict global order. In load testing they hit a wall at roughly 1 MB/s of publish throughput and could not push past it no matter how many subscriber instances they added. The cause was the per-ordering-key throughput cap: one key serializes everything through a single 1 MB/s lane, and subscriber scale-out cannot help a single-key bottleneck.

The constraint was real: posting two events for the same account out of order would corrupt a balance. But events for different accounts had no ordering relationship. The fix was to make the ordering key the account ID instead of a constant, turning one hot lane into thousands of independent ones, each with its own 1 MB/s budget. They paired it with a dead-letter topic (--max-delivery-attempts=10) so a single malformed event for one account couldn’t permanently stall that account’s lane, and added a resume_publish call on the publisher’s error path so a transient publish failure didn’t wedge a key. Aggregate throughput scaled with subscriber count, per-account ordering held, and the redelivery-cascade blast radius shrank from “the whole stream” to “one account.”

gcloud pubsub subscriptions create ledger-postings \
  --topic=orders \
  --enable-message-ordering \
  --enable-exactly-once-delivery \
  --ack-deadline=60 \
  --dead-letter-topic=orders-dlq \
  --max-delivery-attempts=10 \
  --min-retry-delay=10s \
  --max-retry-delay=300s

The lesson generalizes: ordering keys are a partitioning decision, not a correctness toggle. Choose the key at the granularity where order actually matters, and no coarser.

Verify

Confirm the configuration and behavior end to end:

# Subscription has exactly-once, ordering, retry, and dead-letter set
gcloud pubsub subscriptions describe ledger-postings \
  --format="yaml(enableExactlyOnceDelivery, enableMessageOrdering, retryPolicy, deadLetterPolicy)"

# Service agent has both required IAM grants
gcloud pubsub topics get-iam-policy orders-dlq \
  --format="table(bindings.role, bindings.members)"
gcloud pubsub subscriptions get-iam-policy ledger-postings \
  --format="table(bindings.role, bindings.members)"

# Publish a couple of ordered messages and confirm in-order receipt
gcloud pubsub topics publish orders --message='{"seq":1}' --ordering-key=acct-42
gcloud pubsub topics publish orders --message='{"seq":2}' --ordering-key=acct-42

# Inspect backlog freshness during/after a load test
gcloud pubsub subscriptions describe ledger-postings \
  --format="value(name)"

Then check Monitoring: oldest_unacked_message_age should stay low under steady load, expired_ack_deadlines_count should be near zero, and num_undelivered_messages should drain rather than grow. Force a handler error path to confirm messages land in orders-dlq after the configured attempts, and confirm delivery_attempt is present on the dead-lettered message.

Pub/Sub Delivery Guarantees: Exactly-Once, Ordering Keys, Dead-Letter, and Flow Control

1. Delivery semantics: at-least-once vs exactly-once

2. Enabling exactly-once delivery and idempotent ack handling

3. Ordering keys: guarantees, trade-offs, and resume-on-failure

4. Retry policies, exponential backoff, and redelivery

5. Dead-letter topics: configuration, IAM, and reprocessing

6. Subscriber flow control and outstanding-message tuning

7. Push vs pull vs StreamingPull and managed export subscriptions

8. Monitoring backlog, oldest unacked age, and expired acks

Enterprise scenario

Verify

Checklist

Written by Vinod

Comments

Keep Reading

BigQuery Fine-Grained Security: Column-Level, Row-Level, and Data Masking

Cloud DNS at Scale: Private Zones, Peering, Forwarding, and Response Policies

Event-Driven Architecture with Cloud Functions 2nd Gen and Eventarc