Azure Event Hubs at Scale: Partitioning, Capture, Kafka Endpoint, and Stream Analytics Processing

Event Hubs is the part of an Azure data platform that breaks in the least obvious way. A queue that overflows pages you; an ingestion pipeline that silently caps at its throughput-unit ceiling just adds latency, then drops the oldest events off the retention window while every dashboard stays green. The mistakes that cause this are made on day one — wrong partition count, a checkpoint store nobody load-tested, a consumer that commits offsets it never processed — and partition count in particular is immutable on the Standard tier. You live with the choice.

Azure Event Hubs is a managed, multi-tenant event-streaming service: a distributed, append-only commit log you write millions of events per second into, that fans the same stream out to many independent readers, archives it durably, and speaks the Kafka wire protocol so existing producers connect with no code change. It is the front door of nearly every Azure real-time analytics, IoT-telemetry and event-driven architecture. This guide builds a high-throughput pipeline the way it survives production: capacity sized against real numbers, partitions chosen for parallelism without lying about ordering, consumers that checkpoint correctly, Capture archiving every event to ADLS Gen2 for replay, the Kafka endpoint fronting existing producers, and an Azure Stream Analytics (ASA) job doing windowed aggregation with exactly-once output. Examples use the az eventhubs CLI, Bicep, and the modern Azure.Messaging.EventHubs SDK family — the supported successor to the old Microsoft.Azure.EventHubs and WindowsAzure.ServiceBus packages.

By the end you will stop guessing about capacity and ordering. You will size throughput units against egress not ingress, pick a partition count you cannot change later with eyes open, checkpoint after processing so a crash never loses data, and read the three metrics — ThrottledRequests, IncomingBytes/OutgoingBytes, and a derived consumer lag — that tell you the truth when a pipeline silently falls behind. Because this is a reference you will return to mid-incident, the tiers, limits, settings, error conditions and the operational playbook are all laid out as scannable tables: read the prose once, then keep the tables open when lag starts climbing at 6am.

What problem this solves

The pain is silent backpressure. A messaging system that can’t keep up should scream — a full queue, a rejected write, a dead-letter pile. Event Hubs does the opposite: under-provisioned, it throttles producers (adding latency, not errors at first), and under-consumed, it lets lag grow until events age past the retention window and are deleted, on time, exactly as configured. Nothing alarms because nothing is “broken” — the platform is doing precisely what you told it. The data is just gone, and you find out when a downstream report has a six-hour hole.

What breaks without the knowledge in this article: a team sizes a hub for “1 MB/s of telemetry,” forgets that three consumer groups each read the full stream, and is egress-throttled at a third of expected throughput. Or they pick 4 partitions for a 3,000-device fleet, the fleet grows to 120,000, and because at most one reader owns a partition, the enrichment job caps at 4 parallel readers forever — and partition count cannot be raised on Standard. Or they checkpoint before processing “to be safe” and a crash quietly drops every in-flight event. Or they enable Capture, lock the namespace behind a private endpoint, and never notice Capture stopped writing because the storage account lost its network path.

Who hits this: anyone building real-time analytics, IoT ingestion, clickstream or log pipelines, or migrating an existing Kafka workload onto Azure. It bites hardest on high-fan-out workloads (many consumer groups → egress-bound), high-cardinality-but-skewed keys (hot partitions), cost-sensitive Standard deployments (the immutable-partition trap), and anyone who treats Capture and checkpointing as fire-and-forget. The fix is almost never “throw more TUs at it” — it’s “size against egress, key for spread, checkpoint after work, and turn on Capture before you need it.”

To frame the whole field before the deep dive, here is every failure class this article covers, the question it forces, and where to look first:

Failure class	What’s actually happening	First question to ask	First place to look	Most common single cause
Producer throttling	Hit the TU/PU ceiling on ingress	Am I ingress- or egress-bound?	`ThrottledRequests` + Incoming vs Outgoing bytes	Egress = ingress × consumer groups, undersized
Consumer lag growth	Readers fall behind, events near retention	Is parallelism capped by partition count?	Derived lag: last enqueued seq − checkpoint	Too few partitions; one slow reader
Silent data loss	Events aged past retention before read	Did lag exceed the retention window?	Retention setting vs lag trend	Lag never alarmed; retention too short
Capture stopped	No Avro files appear in ADLS	Can Capture reach the storage account?	`ArchiveLogs`; container listing	Private endpoint / firewall blocks storage
Duplicate output	Sink has duplicate rows	Is this sink exactly-once or at-least-once?	Sink type vs ASA guarantee matrix	At-least-once sink + no dedup
Kafka client breaks	Producer/consumer errors on EH endpoint	Does the client use an unsupported API?	Client feature use vs supported set	Transactions / idempotent EOS not supported

Learning objectives

By the end of this article you can:

Size throughput units against the larger of ingress and total egress (ingress × consumer groups), and configure auto-inflate as a cost guardrail rather than a target.
Choose a partition count with full awareness that it is immutable on Standard, and pick a partition key that gives per-entity ordering without creating hot partitions.
Stand up consumers with the EventProcessorClient and a Blob checkpoint store, checkpointing after idempotent processing and in batches, so a restart resumes instead of replaying from retention.
Enable Capture to ADLS Gen2 with the right interval, size limit, skip-empty-archives, and {PartitionId} name format, and use the Avro archive for replay and late reprocessing.
Front existing producers and consumers with the Kafka endpoint on port 9093, and name exactly which Kafka APIs (transactions, idempotent EOS, admin) are not supported.
Write a Stream Analytics query using TIMESTAMP BY ... OVER <key> on event time with an explicit watermark, and match the sink to the exactly-once vs at-least-once guarantee you actually need.
Lock the namespace onto a VNet with a private endpoint in privatelink.servicebus.windows.net without breaking Capture, and decide when Premium/Dedicated is justified.
Diagnose throttling, lag and Capture failures from ThrottledRequests, IncomingBytes/OutgoingBytes, derived lag, and AzureDiagnostics logs — and map each symptom to a confirmed root cause and fix.

Prerequisites & where this fits

You should already understand the basics of cloud messaging — that a publish/subscribe system decouples producers from consumers, that at-least-once delivery means you can see an event more than once, and that ordering and parallelism are usually in tension. You should know how to run az in Cloud Shell, read JSON output, and reason about an Azure resource group, a managed identity, and RBAC role assignments. Familiarity with a lake/storage layer (Blob or ADLS Gen2) and with SQL helps for the Capture and Stream Analytics sections.

This sits in the Data & Analytics track and is the ingestion backbone beneath stream processing and the lake. It pairs tightly with Azure Cosmos DB: Partition-Key Design & RU Optimization — partitioning intuition transfers directly, and Cosmos DB is a common exactly-once ASA sink. It feeds Azure Data Factory, Synapse & Microsoft Fabric Deep Dive for batch/lakehouse processing of the captured Avro, and contrasts with Azure Service Bus: Sessions, Dedup & Dead-Letter Patterns (high-value transactional messaging) and Azure Event Grid: MQTT & Event-Driven Routing (discrete reactive events). The pressure dynamics here are the concrete Azure instance of Backpressure & Flow Control in Streaming Systems.

To place Event Hubs against its neighbours so you don’t pick the wrong service, here is the messaging-family map:

Service	Model	Best for	Ordering	Retention model	Not for
Event Hubs	Event streaming (log)	High-volume telemetry, clickstream, Kafka workloads	Per-partition	Time window (1–90d), re-readable	Per-message transactions, competing-consumer work queues
Service Bus	Brokered messaging (queue/topic)	Commands, orders, workflows needing transactions	Per-session (FIFO)	Until consumed / TTL	Millions-of-events/sec firehoses
Event Grid	Event routing (pub/sub)	Discrete reactive events, MQTT IoT, system events	None	Retry + dead-letter	High-throughput analytics streams
Storage Queue	Simple queue	Basic decoupling, large backlogs	None	7 days	Streaming, fan-out, ordering
Kafka (self-managed)	Event streaming (log)	Full Kafka API incl. transactions/Streams	Per-partition	Configurable	Teams who want zero broker ops

Where Event Hubs sits in a pipeline: producers (devices, apps, Kafka clients) write to a hub; the hub fans the stream to many consumer groups; Capture archives it; consumers (ASA, Functions, Spark) process it; sinks (SQL, Cosmos, the lake) store the results. Get the front door wrong and everything downstream inherits the latency, the lag, and the data loss.

Core concepts

Six mental models make every later decision obvious.

A hub is a partitioned, append-only log — not a queue. A queue hands each message to one consumer and deletes it; a hub keeps every event for the whole retention window and lets every consumer group read the entire stream independently, each at its own position. Nothing is “consumed” or removed by reading. This is why fan-out is cheap and why “the queue is full” is the wrong mental model — there is no draining, only a sliding retention window.

Tier decides the unit, the limits, and what you can change later. Standard bills in throughput units (TUs) and fixes partition count at creation. Premium and Dedicated bill in processing units (PUs) / capacity units (CUs), allow far higher partition counts, let you raise partition count after creation, and include Capture. Choosing the tier is the first irreversible decision; everything below assumes you made it deliberately.

A throughput unit is a two-sided rate limit, and the two sides differ. 1 TU = 1 MB/s or 1000 events/s ingress and 2 MB/s or 4096 events/s egress, whichever you hit first. Egress is double ingress on purpose: every event is typically read by multiple consumer groups, so a single 1 MB/s producer feeding three groups already needs 3 MB/s of egress. You are almost always egress-bound, not ingress-bound — and TUs are shared across the entire namespace, not per hub, so a noisy hub starves a quiet one.

Partition count is your maximum read parallelism AND your unit of ordering — simultaneously. A partition is an ordered, append-only sub-log. Ordering holds within a partition (send order) but never across partitions. And within one consumer group, at most one active reader owns a partition — so N partitions caps you at N concurrent processors. You cannot out-scale your partition count, and on Standard you cannot change it. The partition key is the lever: it hashes to a partition, keeping all events for one key ordered together, provided the key is high-cardinality so the spread stays even.

Consumers checkpoint; the checkpoint is the only thing that makes restart resumable. A consumer group is an independent cursor over the stream. The EventProcessorClient distributes partition ownership across instances, load-balances on scale, and persists progress to a checkpoint store (a Blob container). A checkpoint says “I am done through offset X.” Checkpoint before processing and a crash loses in-flight events (silent loss); checkpoint per event and the Blob store throttles. The rule is: checkpoint after successful, idempotent work, in batches — because Event Hubs is at-least-once and a restart reprocesses from the last checkpoint.

Capture is a free durable archive and the foundation of replay. Capture writes the raw stream to Blob/ADLS as Avro, with no consumer to run and no impact on your egress budget. It is the cheapest durable copy you will get and the answer to “a downstream had a bug for six hours — reprocess yesterday” without ever racing the live retention clock.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side:

Concept	One-line definition	Where it lives	Why it matters at scale
Namespace	The container + capacity boundary for hubs	Resource group	TUs/PUs are shared here; noisy-neighbour isolation
Event hub	One partitioned log (a Kafka “topic”)	In a namespace	The stream you write to and read from
Throughput unit (TU)	Standard capacity: 1 MB/s in, 2 MB/s out	Namespace (Standard)	The rate ceiling you throttle against
Processing unit (PU)	Premium capacity reservation (CPU/mem)	Namespace (Premium)	Predictable latency, no per-sec throttle
Partition	Ordered, append-only sub-log	In a hub	Caps parallelism; ordering granularity
Partition key	Value hashed to choose a partition	Per event	Per-key ordering + even spread
Consumer group	Independent cursor over the stream	In a hub	One per downstream; fan-out
Checkpoint	“Processed through offset X” marker	Blob container	Makes restart resumable; loss if wrong
`EventProcessorClient`	Client that owns partitions + checkpoints	In your consumer app	Don’t hand-roll ownership/load-balance
Capture	Auto-archive the stream to Avro	Hub → Blob/ADLS	Replay, audit, late reprocessing
Kafka endpoint	Wire-compatible broker on :9093	Namespace (Std+)	Lift-and-shift Kafka clients
Stream Analytics (ASA)	SQL-over-streams engine	Separate resource	Windowed aggregation, exactly-once sinks
Auto-inflate	Auto-raise TUs up to a ceiling	Namespace (Standard)	Cost guardrail; ingress-only trigger
Retention	How long events are kept	Per hub	Lag past this = permanent data loss

Tiers, units, and limits — the reference

Half of the bad day-one decisions are a tier-and-limit table away from avoided. This is the canonical comparison — pick the tier before you provision, because the unit, the ceilings, and what you can change later all flip between them.

Capability	Basic	Standard	Premium	Dedicated
Capacity unit	TU	TU	PU	CU
Max throughput	1 TU (20 with quota)	up to 40 TU (more by request)	per-PU, scales with PUs	very high (multi-CU)
Consumer groups / hub	1	20	100	1000
Max retention	1 day	7 days	up to 90 days	up to 90 days
Capture	Not available	Paid add-on	Included	Included
Partition count change after create	No	No (immutable)	Yes (increase only)	Yes (increase only)
Max partitions / hub (default)	32	32	100 per PU	1024+ per CU
Kafka endpoint	No	Yes	Yes	Yes
Tenancy	Multi-tenant	Multi-tenant	Multi-tenant (isolated PUs)	Single-tenant
Dynamic partition scale-out	No	No	Yes	Yes
Typical use	Dev / low volume	Most workloads	Predictable latency, larger retention	100+ MB/s, compliance isolation

The two units behave differently enough that mixing their mental models causes mis-sizing. Here is TU versus PU side by side:

Dimension	Throughput unit (TU) — Standard	Processing unit (PU) — Premium
What it is	A two-sided rate limit (1 in / 2 out MB/s)	A CPU/memory reservation
Ingress per unit	1 MB/s or 1000 events/s	~5–10 MB/s (payload-dependent; validate)
Egress per unit	2 MB/s or 4096 events/s	Generous, no fixed per-sec throttle
Throttling model	`ThrottledRequests` when over	Latency degrades; no hard per-sec cap
Scaling	Auto-inflate (ingress trigger)	Add/remove PUs; no auto-inflate
Tenancy	Shared platform	Isolated compute per namespace
When to choose	Variable, cost-sensitive workloads	Predictable latency, larger retention, isolation

And the hard numbers you will actually bump into — keep this open when sizing or debugging a limit error:

Limit / quota	Standard value	Notes / how it bites
Ingress per TU	1 MB/s or 1000 events/s	Whichever trips first
Egress per TU	2 MB/s or 4096 events/s	Double ingress because of fan-out
Max TUs (default)	40 (20 self-service; more by request)	Beyond → Premium/Dedicated
Partitions per hub	1–32 default (more by request)	Immutable on Standard
Consumer groups per hub	20	Beyond → Premium
Max event size	1 MB (Standard); larger on Premium	Oversized send is rejected
Retention	1–7 days (Standard)	Lag past this = data loss
Capture interval	60–900 seconds	Size or time window, whichever first
Capture size limit	10 MB – 500 MB	Per-window file roll
Kafka port	9093 (TLS)	Not 9092 (no plaintext)
AMQP port	5671 (TLS)	5672 plaintext not allowed
Brokered connections	per-TU allotment	Many short-lived connections exhaust it

Throughput units, processing units, and auto-inflate

Capacity on the Standard tier is a throughput unit. The egress allowance is double ingress because every event is typically read by multiple consumer groups — so size against the larger of your two numbers, and that number is almost always egress. TUs are shared across the entire namespace: ten hubs in one namespace contend for one TU pool, and a noisy hub starves a quiet one. Isolate latency-sensitive workloads into their own namespace.

Auto-inflate raises TUs automatically up to a ceiling as load grows. It never scales down — that is a manual operation or a scheduled job. Treat the ceiling as a cost guardrail, not a target.

RG=rg-eh-telemetry
NS=eh-telemetry-prod          # globally unique
LOC=eastus

az group create -n $RG -l $LOC

# Standard namespace, auto-inflate 2 -> 10 TUs
az eventhubs namespace create -g $RG -n $NS -l $LOC \
  --sku Standard --capacity 2 \
  --enable-auto-inflate true \
  --maximum-throughput-units 10

resource ns 'Microsoft.EventHub/namespaces@2024-01-01' = {
  name: 'eh-telemetry-prod'
  location: location
  sku: { name: 'Standard', tier: 'Standard', capacity: 2 }
  properties: {
    isAutoInflateEnabled: true
    maximumThroughputUnits: 10
    minimumTlsVersion: '1.2'
  }
}

The Premium/Dedicated unit is the processing unit. PUs are CPU/memory reservations that deliver predictable latency and isolated tenancy; there is no per-second event throttle to reason about the way TUs impose. A rough planning figure is 5–10 MB/s ingress per PU depending on payload size and partition spread — but you validate with a load test, never a brochure number. Auto-inflate is an ingress trigger: if you are egress-bound, it may not fire even while consumers throttle, so alarm on egress throttling directly.

The behaviour of auto-inflate, option by option, because the defaults surprise people:

Aspect	Behaviour	Default	When to change	Gotcha
Direction	Scales up only	up-only	n/a	Never scales down — manual/scheduled
Trigger	Ingress throttling	ingress	n/a	Egress-bound workloads may never trigger it
Ceiling	`maximumThroughputUnits`	none until set	Set as a cost cap	Forgetting it = unbounded bill on a spike
Starting capacity	`--capacity`	1	Set to your steady-state	Too low → cold-start throttling before inflate
Reaction time	Near-real-time but not instant	—	—	Brief throttling during a sharp spike
Scope	Per namespace	namespace	—	All hubs share the inflated capacity

Sizing is arithmetic, not guesswork. The formula and a worked table:

required_TUs = ceil( max( ingress_MBps / 1 , total_egress_MBps / 2 ) ) where total_egress = ingress × number_of_consumer_groups, then add headroom.

Ingress	Consumer groups	Total egress	TUs by ingress	TUs by egress	Provision (with headroom)
1 MB/s	1	1 MB/s	1	1	2
1 MB/s	3	3 MB/s	1	2	3
5 MB/s	2	10 MB/s	5	5	6
5 MB/s	4	20 MB/s	5	10	12
10 MB/s	3	30 MB/s	10	15	18
20 MB/s	4	80 MB/s	20	40	consider Premium

Re-derive this every time you add a consumer group, because each one adds a full copy of egress.

Partition count: ordering vs parallelism

A partition is an ordered, append-only log, and two truths follow that are in tension: ordering is per-partition only (across partitions there is no global order), and parallelism is capped by partition count (within one consumer group, at most one active reader owns a partition, so N partitions means at most N concurrent processors). Partition count is therefore simultaneously your maximum read parallelism and the granularity at which ordering holds. The decision hinges on the partition key.

Send strategy	Ordering	Parallelism	Hot-partition risk	Use when
No key (round-robin)	None	Best balanced spread	Low	Order doesn’t matter; max throughput
Partition key (hashed)	Per key	Good if keys high-cardinality	Low with good key	Per-entity ordering (per device, per account)
Explicit partition id	Per partition	Manual; you pin yourself	High	Almost never

Always prefer a partition key over an explicit partition id. A key like region with five values wastes everything beyond five partitions and creates hot partitions; a key like deviceId with 120,000 values spreads evenly while keeping per-device ordering. Sizing rule: partitions ≥ peak TUs (or PU-equivalent throughput), and ≥ your maximum desired consumer parallelism, with headroom. On Standard, partition count is fixed at creation (1–32 by default) and cannot change — so over-provision modestly. 32 is a sane default for a hub you expect to grow; it costs nothing extra on Standard and gives you room.

EH=device-telemetry
az eventhubs eventhub create -g $RG --namespace-name $NS -n $EH \
  --partition-count 32 \
  --cleanup-policy Delete \
  --retention-time-in-hours 168     # 7 days (Standard max is 7d; Premium up to 90d)

resource hub 'Microsoft.EventHub/namespaces/eventhubs@2024-01-01' = {
  parent: ns
  name: 'device-telemetry'
  properties: {
    partitionCount: 32                 // immutable on Standard
    retentionDescription: {
      cleanupPolicy: 'Delete'
      retentionTimeInHours: 168
    }
  }
}

How the partition-count decision differs by tier, and the consequences:

Tier	Set at create	Change later	Max (default)	What “change” actually does
Standard	1–32	Never	32	n/a — a side-by-side migration is the only path
Premium	1–100/PU	Increase only	100 × PUs	New partitions start empty; mapping shifts for new sends
Dedicated	1–1024+/CU	Increase only	1024+ × CUs	Same as Premium; existing data not rebalanced

On Premium/Dedicated you can increase partition count after creation, but only upward, only with the Delete cleanup policy (not compaction), and existing data is not rebalanced — new partitions start empty and key-to-partition mapping shifts for new sends. Plan it as a migration, not a knob.

Choosing a partition key well is the whole game. A decision table:

If your key is…	Cardinality	Spread	Ordering you get	Verdict
`deviceId` / `userId` / `accountId`	Very high	Even	Per entity	Ideal
`region` / `tenant` (few values)	Low	Skewed, hot partitions	Per region	Avoid — hot partitions
`eventType` (handful)	Low	Skewed	Per type	Avoid
Composite (`region:deviceId`)	High	Even	Per composite	Good if you need region locality
Random GUID per event	Maximal	Even	None per entity	Use only when order is irrelevant
None (round-robin)	n/a	Best	None	Best raw throughput; no ordering

Consumer groups, checkpoint stores, and the processor client

A consumer group is an independent view over the hub with its own cursor — each group reads the full stream at its own pace. Give every distinct downstream its own group; never share one group across two unrelated apps, or they will steal partition ownership from each other. Standard allows up to 20 consumer groups per hub.

az eventhubs eventhub consumer-group create -g $RG --namespace-name $NS \
  --eventhub-name $EH -n stream-analytics
az eventhubs eventhub consumer-group create -g $RG --namespace-name $NS \
  --eventhub-name $EH -n enrichment-worker

The right consumer abstraction is the EventProcessorClient (or EventHubConsumerClient with a checkpoint store in Python/JS). It does three things you must not hand-roll: distributes partition ownership across instances, load-balances when instances are added or removed, and persists progress to a checkpoint store (a Blob container). Checkpointing is what makes restart resumable — without it, a restart replays from the configured start position.

// Azure.Messaging.EventHubs.Processor — checkpoints to Blob
var storage = new BlobContainerClient(
    new Uri("https://ehcheckpoints.blob.core.windows.net/enrichment"),
    new DefaultAzureCredential());

var processor = new EventProcessorClient(
    storage,
    consumerGroup: "enrichment-worker",
    fullyQualifiedNamespace: "eh-telemetry-prod.servicebus.windows.net",
    eventHubName: "device-telemetry",
    credential: new DefaultAzureCredential());

processor.ProcessEventAsync += async args =>
{
    await HandleAsync(args.Data);          // your idempotent work
    await args.UpdateCheckpointAsync();    // checkpoint AFTER successful processing
};
processor.ProcessErrorAsync += args =>
{
    Log.Error(args.Exception, "partition {P}", args.PartitionId);
    return Task.CompletedTask;
};

await processor.StartProcessingAsync();

Three rules carry the design — and each has a precise failure mode if you get it backwards:

Rule	Why	What breaks if you don’t	Tuning knob
Checkpoint after processing	A checkpoint is “done through X”	Crash loses every in-flight event — silent data loss	Move `UpdateCheckpointAsync` after the work
Checkpoint in batches	Every checkpoint is a Blob write	Per-event checkpointing throttles storage, dominates latency	Checkpoint every N events / few seconds
Make processing idempotent	Event Hubs is at-least-once	Reprocessing after a crash double-applies effects	Dedup on a business key; replay-safe effects

The checkpoint store itself has properties worth knowing before it bites you:

Checkpoint-store aspect	Detail	Why it matters
Backing store	A Blob container (one blob per partition)	Sized writes; can throttle on per-event checkpointing
Identity	`DefaultAzureCredential` / managed identity preferred	Avoid connection strings in consumers
Granularity	Per partition	Each owner writes its own progress blob
On startup with no checkpoint	Reads from configured start position	Wrong default → replay from retention start
On rebalance	Ownership blobs reassign partitions	Brief duplicate reads during handoff (idempotency covers it)
Throttling risk	429s from the storage account	Batch checkpoints; co-locate store region

Consumer client choices, so you pick the right one for the language and job:

Client	Language	Owns partitions?	Checkpoints?	Use when
`EventProcessorClient`	.NET	Yes	Yes (Blob)	Production multi-instance consumers
`EventHubConsumerClient` (+ checkpoint store)	Python / JS	Yes (with store)	Yes	Production in those languages
`EventHubConsumerClient` (no store)	.NET / others	No	No	Quick reads, debugging, single reader
`PartitionReceiver`	.NET	Manual (one partition)	Manual	Fine-grained control; you own ownership
Kafka consumer	Any	Yes (Kafka groups)	Kafka offsets	Lift-and-shift existing Kafka consumers

Event Hubs Capture to ADLS with Avro and replay

Capture automatically writes the raw stream to Blob or ADLS Gen2 as Avro, with no consumer to run and no impact on your TU egress budget. It is the cheapest durable archive you will get and the foundation of replay, late-arriving reprocessing, and audit. Files roll on a size or time window, whichever trips first.

SA=ehcapturelake          # ADLS Gen2 account, hierarchical namespace ON

az eventhubs eventhub update -g $RG --namespace-name $NS -n $EH \
  --enable-capture true \
  --capture-interval 300 \
  --capture-size-limit 314572800 \
  --skip-empty-archives true \
  --destination-name EventHubArchive.AzureBlockBlob \
  --storage-account "/subscriptions/<sub-id>/resourceGroups/$RG/providers/Microsoft.Storage/storageAccounts/$SA" \
  --blob-container capture \
  --archive-name-format '{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}'

--capture-interval is seconds (60–900), --capture-size-limit is bytes (10 MB–500 MB). Always set --skip-empty-archives true — otherwise quiet partitions write empty Avro files every interval and you pay for storage, transactions, and downstream scan time on noise. The archive-name-format must contain {PartitionId}, {Namespace}, and {EventHub} (the time tokens are optional but you want them for partition pruning). Each Avro record carries SequenceNumber, Offset, EnqueuedTimeUtc, Body, and Properties, so the archive is self-describing — you can reconstruct exact stream order per partition from the captured files alone.

Every Capture setting, its range, default, and the trade-off:

Setting	Values	Default	When to change	Trade-off / gotcha
`--enable-capture`	true / false	false	Turn on for replay/audit	Off = no durable archive
`--capture-interval`	60–900 s	300	Lower for fresher files	Smaller, more frequent files
`--capture-size-limit`	10–500 MB (bytes)	314572800	Lower to bound file size	Many small files vs fewer large
`--skip-empty-archives`	true / false	false	Always set true	Default writes empty files on idle partitions
`--destination-name`	`EventHubArchive.AzureBlockBlob`	—	Required	Misnamed = Capture won’t start
`--storage-account`	full resource id	—	Required	Wrong id / no path = no writes
`--blob-container`	container name	—	Required	Must exist or be creatable
`--archive-name-format`	token template	platform default	Add time tokens for pruning	Must include `{PartitionId}`

The capture destination format and what each token gives you in the lake path:

Token	Expands to	Why include it
`{Namespace}`	The namespace name	Required; multi-namespace lakes
`{EventHub}`	The hub name	Required; multi-hub separation
`{PartitionId}`	Partition index	Required; per-partition ordering on replay
`{Year}/{Month}/{Day}`	UTC date parts	Date-partition pruning in queries
`{Hour}/{Minute}/{Second}`	UTC time parts	Fine-grained time-window replay

Capture writes Avro only — there is no JSON or Parquet option. If your lake standard is Parquet, run a downstream conversion (a Synapse/Databricks job, or Stream Analytics with a Parquet output). Do not try to “fix” the Capture format; you cannot. See Azure Data Factory, Synapse & Microsoft Fabric Deep Dive for the conversion job, and Azure Blob Storage: Lifecycle, Immutability & Soft Delete to age the Avro out cheaply.

Replay is “read the Avro back.” Point Spark, Stream Analytics (reference/blob input), or a backfill job at the container and filter by the time path. Capture is the answer to “a downstream had a bug for six hours, reprocess yesterday” without ever touching live retention. The replay options compared:

Replay method	Reads from	Best for	Watch out for
Re-read live stream (rewind cursor)	The hub (within retention)	Recent reprocessing inside the window	Only works if data still in retention
Capture Avro via Spark/Databricks	ADLS	Large historical backfills	Avro schema handling; partition order
ASA blob/reference input	ADLS	Stream-style replay into a job	Event-time vs file-time semantics
ADF/Synapse pipeline	ADLS	Scheduled batch reprocessing	Format conversion (Avro→Parquet)
Custom reader (SequenceNumber order)	ADLS	Exact-order reconstruction	Must sort by SequenceNumber per partition

Kafka-protocol endpoint for existing producers and consumers

Every Standard+ namespace exposes a Kafka endpoint on port 9093 — your existing Kafka producers and consumers connect to Event Hubs as if it were a broker, no code change, just config. A namespace maps to a bootstrap server; an event hub maps to a Kafka topic; partitions and consumer groups map 1:1. Authentication is SASL/PLAIN over TLS, where the username is the literal string $ConnectionString and the password is the namespace connection string.

# Kafka client config pointing at an Event Hubs namespace
bootstrap.servers=eh-telemetry-prod.servicebus.windows.net:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
  username="$ConnectionString" \
  password="Endpoint=sb://eh-telemetry-prod.servicebus.windows.net/;SharedAccessKeyName=...;SharedAccessKey=...";

For production, prefer OAuth (SASL/OAUTHBEARER) with Microsoft Entra ID over a connection string so you are not shipping a shared key into every client. Either way, the endpoint is wire-compatible enough for librdkafka, the Java client, Kafka Connect, and most Kafka tooling.

The concept mapping, so a Kafka engineer can translate in their head:

Kafka concept	Event Hubs equivalent	Notes
Bootstrap server	Namespace FQDN `:9093`	One per namespace
Topic	Event hub	1:1
Partition	Partition	1:1; same ordering semantics
Consumer group	Consumer group	1:1; 20 max on Standard
Offset	Offset / SequenceNumber	Re-readable within retention
Broker	The platform	No broker config you manage
Replication factor	Managed by platform	Not a knob you set

Know the gaps before you bet on it. The supported-vs-not table is the difference between a clean lift-and-shift and a broken Streams app:

Kafka feature	Supported on EH?	If you need it
Produce / consume (basic)	Yes	Just works
Consumer groups + offsets	Yes	1:1 mapping
SASL/PLAIN over TLS	Yes	Use `$ConnectionString`
SASL/OAUTHBEARER (Entra ID)	Yes	Preferred for prod
Kafka Connect	Yes (most connectors)	Validate the specific connector
Transactions API	No	Redesign; or self-managed Kafka
Idempotent producer (EOS)	No	Dedup downstream instead
Kafka Streams (txn-dependent)	Partial/No	Use ASA or self-managed Kafka
Compacted topics (Kafka semantics)	No (EH has its own log compaction)	Configure EH compaction separately
Broker-level admin ops	No	Use `az`/ARM for management

It is excellent for lift-and-shift of producers/consumers; it is not a drop-in for a Kafka Streams app that depends on transactions. Validate your specific client’s feature use against the supported set above.

Stream Analytics: windowing, watermarks, and exactly-once

Azure Stream Analytics (ASA) reads an event hub as input and runs SQL-like queries with temporal windows. The four window types are the vocabulary of stream processing:

Window	Shape	Overlap	Emits	Use
Tumbling	Fixed, non-overlapping	None	At window close	Per-minute counts, billing buckets
Hopping	Fixed size, hops by an interval	Yes (by hop)	Each hop	Moving aggregates (“last 5 min, every 1 min”)
Sliding	Variable, event-driven	Continuous	On event arrival/expiry	“alert if >N in any 30s”
Session	Gap-defined, variable length	None	On gap timeout	User sessions, activity bursts
Snapshot	Grouping by exact timestamp	n/a	At each timestamp	`System.Timestamp()` grouping

The non-negotiable detail is event time vs arrival time. By default ASA windows on arrival time; for correct results under network delay and retries you must declare the event timestamp and a watermark tolerance via TIMESTAMP BY ... OVER .... The watermark is how far ASA waits for stragglers before closing a window — too tight drops late events, too loose adds latency. Make it explicit.

-- Per-device 1-minute counts on EVENT time, partitioned for scale
SELECT
    deviceId,
    System.Timestamp() AS windowEnd,
    COUNT(*)           AS events,
    AVG(temperature)   AS avgTemp
INTO   [aggregates-output]
FROM   [telemetry-input] TIMESTAMP BY enqueuedAt OVER deviceId
GROUP BY deviceId, TumblingWindow(minute, 1)

TIMESTAMP BY enqueuedAt OVER deviceId applies the timestamp per partition key — substreams — so one slow device cannot hold the global watermark hostage. This is the single highest-leverage clause in an ASA query at scale.

Exactly-once is real but conditional. ASA guarantees exactly-once processing internally, and exactly-once delivery end-to-end only to specific sinks. Match your sink to your guarantee:

Sink	Delivery guarantee	Notes
Azure SQL Database	Exactly-once	Native EOS path
Cosmos DB	Exactly-once	Upsert on id; see Cosmos partition design
Synapse / Blob via Delta (v2 path)	Exactly-once	Streaming-ingestion / Delta output
Blob / ADLS (JSON, CSV, Avro, Parquet)	At-least-once	Sink must dedup
Event Hubs output	At-least-once	Downstream dedup
Service Bus	At-least-once	Use dedup window
Power BI	At-least-once (streaming)	Dashboard, not source of truth
Table storage / others	At-least-once	Dedup on key

To use all partitions in parallel, make the job embarrassingly parallel: keep the same partition key from input through GROUP BY to output, and align input partition count, query partitioning, and streaming units (SUs). The ASA scaling levers:

Lever	What it controls	Symptom it fixes	Limit / note
Streaming Units (SUs)	Compute allocated to the job	High SU% util, watermark delay	Must be a multiple compatible with partitioning
Query partitioning (`PARTITION BY`)	Parallel substreams	Single-threaded bottleneck	Align to input partition key
Input partition count	Read parallelism	Under-utilised SUs	Inherited from the hub
Output partitioning	Parallel writes to sink	Sink write bottleneck	Sink must support it
Compatibility level	Engine semantics version	Subtle behaviour changes	Pin and test before raising

Add SUs when the job’s SU% utilization or watermark delay climbs.

Networking: private endpoints and dedicated clusters

By default the namespace has a public FQDN. To pull it onto the VNet, disable public access and add a private endpoint, which maps the namespace into private DNS zone privatelink.servicebus.windows.net (Event Hubs shares the Service Bus DNS namespace — this is correct, not a typo).

# Lock down public access, then private endpoint onto the VNet
az eventhubs namespace update -g $RG -n $NS --public-network-access Disabled

az network private-endpoint create -g $RG -n pe-$NS \
  --vnet-name vnet-data --subnet snet-pe \
  --private-connection-resource-id $(az eventhubs namespace show -g $RG -n $NS --query id -o tsv) \
  --group-id namespace \
  --connection-name eh-conn

az network private-dns zone create -g $RG -n privatelink.servicebus.windows.net
az network private-dns link vnet create -g $RG -n link-eh \
  --zone-name privatelink.servicebus.windows.net --virtual-network vnet-data --registration-enabled false
az network private-endpoint dns-zone-group create -g $RG \
  --endpoint-name pe-$NS -n zg \
  --private-dns-zone privatelink.servicebus.windows.net --zone-name servicebus

The network-access controls and what each one does to traffic:

Control	What it does	Default	When to use	Gotcha
`--public-network-access Disabled`	Blocks all public traffic	Enabled	Private-only namespaces	Breaks Capture unless storage has a path
Private endpoint (`namespace` group)	Maps namespace into the VNet	none	VNet-only access	Needs the `servicebus` private DNS zone
Private DNS zone	`privatelink.servicebus.windows.net`	none	Resolve the PE privately	Shared with Service Bus — by design
IP firewall rules	Allow specific CIDRs	none	Partner/edge access	Maintenance burden; prefer PE
Trusted Microsoft services	Let Capture/ASA reach it	off	Capture to locked storage	Forgetting it = “private networking broke Capture”
Kafka over PE	Still port 9093	—	Kafka clients on the VNet	Same DNS/PE path as AMQP

The Kafka endpoint over a private endpoint still uses port 9093, and Capture still needs a network path to the storage account — give the storage account a private endpoint too, or enable the trusted Microsoft services firewall exception so Capture can write. Forgetting this is the classic “private networking broke Capture” outage. For the DNS mechanics at scale, see Private Endpoints & Private DNS at Scale and the Private Endpoint vs Service Endpoint trade-off.

A Dedicated cluster is the top tier: single-tenant capacity in CUs, the highest partition counts, 90-day retention, and isolation from noisy-neighbour TU contention. You move to Dedicated when sustained throughput crosses roughly the 100+ MB/s range or when tenancy/compliance demands single-tenant isolation — not before, because it carries a substantial fixed monthly floor.

Architecture at a glance

The diagram traces the data as it actually flows — left to right — and pins each throughput-or-correctness failure to the exact hop where it bites. Read it from the left: producers send over the Kafka endpoint (:9093, SASL_SSL) or the Azure.Messaging SDK with a partition key. Both land on the namespace, where the shared TU pool (1 MB/s in, 2 MB/s out per TU) and auto-inflate (an ingress trigger, 2→10 TU) govern capacity — and badge 1 marks the egress-bound throttle that fires here when consumer-group fan-out outruns your TUs. Inside the hub the stream lands on partitions (32, immutable on Standard, one reader each), where badge 2 marks the hot partition a low-cardinality key creates and the lag that grows behind it.

From the partitions the stream fans out two ways: Capture writes Avro to ADLS every 60–900s (badge 4 = Capture broke after a network lockdown), and consumer groups read while a checkpoint store in Blob tracks progress (badge 3 = checkpoint loss or a replay storm from checkpointing before processing). Finally a consumer group feeds the Stream Analytics job — TIMESTAMP BY ... OVER key, scaled by SUs — into an exactly-once sink (SQL/Cosmos) where badge 5 marks the exactly-once illusion of writing to an at-least-once sink. The legend narrates every number as symptom · confirm · fix: localise your problem to a hop, read the cause, run the named check, apply the fix. The single most important read of the diagram: throttling lives at the TU pool (egress, not ingress), and data loss lives at the partitions (lag past retention) — two different hops, two different fixes.

Real-world scenario

Cartos Logistics runs a fleet-telemetry platform: ~120,000 vehicles each emit a GPS-and-diagnostics ping every 5 seconds into a single Standard hub created two years earlier with 4 partitions — chosen when the fleet was 3,000 trucks and 4 felt generous. The namespace runs at 6 TUs with auto-inflate to 10. Two consumer groups read the stream: enrichment-worker (joins route metadata, writes to Cosmos DB) and stream-analytics (per-region speed and idle-time aggregates). Monthly Event Hubs spend is about ₹14,000.

The incident surfaced as a customer complaint, not an alert. A logistics customer’s route-history report for the previous morning had gaps — vehicles that clearly moved showed no positions for a 40-minute stretch. Every dashboard was green. The on-call engineer’s first reflex was to raise TUs (the namespace was occasionally near its egress ceiling during the morning surge), which did nothing, because the problem was not capacity — it was consumer lag exceeding retention.

The breakthrough was deriving lag by hand. There is no single “lag” metric, so the engineer compared each partition’s LastEnqueuedSequenceNumber against the enrichment-worker group’s checkpointed sequence number. On all four partitions the gap was growing through the 7–9am surge and only recovering by mid-morning. With 4 partitions, the enrichment consumer could run at most 4 parallel readers — it had been scaled to 12 instances, but 8 of them sat idle owning nothing, because ownership is capped at one reader per partition. During the surge the 4 active readers fell behind, lag grew, and events that weren’t read within the 7-day retention were deleted on schedule. The “missing” positions had aged out before enrichment ever saw them. No amount of consumer scaling could help: 4 partitions is a hard ceiling of 4 readers, and partition count is immutable on Standard.

The fix was a side-by-side migration, not an in-place change. They provisioned a new hub with 64 partitions, kept the partition key as vehicleId (very high cardinality, so the spread is even and per-vehicle ordering holds), and dual-wrote from producers to old and new hubs for one full retention window. The enrichment service was repointed at the new hub’s consumer group and scaled to 16 instances (well under the 64-partition ceiling, leaving headroom), and lag dropped from 40 minutes to seconds. Capture was enabled on the new hub from minute one, writing Avro to ADLS with skip-empty-archives true — so any future consumer bug is recoverable by replaying yesterday’s Avro instead of racing the retention clock. The old hub was decommissioned after the cutover window closed.

# New hub sized for the real fleet, Capture on from day one
az eventhubs eventhub create -g $RG --namespace-name $NS -n vehicle-telemetry-v2 \
  --partition-count 64 \
  --retention-time-in-hours 168 \
  --enable-capture true --capture-interval 300 --skip-empty-archives true \
  --destination-name EventHubArchive.AzureBlockBlob \
  --storage-account "/subscriptions/<sub-id>/resourceGroups/$RG/providers/Microsoft.Storage/storageAccounts/$SA" \
  --blob-container capture \
  --archive-name-format '{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}'

The incident as a timeline, because the order of moves is the lesson:

Time	Symptom	Action taken	Effect	What it should have been
Day 0	Customer reports route gaps	(ticket filed)	—	Ask: did lag exceed retention?
+1h	“Maybe capacity”	Raise TUs 6 → 10	No change	TUs don’t fix a partition ceiling
+2h	Still gaps	Derive lag per partition	Lag grows on all 4 during surge	This was the breakthrough
+3h	Root cause found	Realise 4 partitions = 4 readers	Immutable on Standard	—
+1 day	Plan	Provision 64-partition hub, dual-write	Both hubs receiving	Side-by-side, never in-place
+1 week	Cutover	Repoint enrichment, scale to 16	Lag → seconds	Correct fix
+1 week	Durability	Enable Capture from minute one	Future bugs = replay, not loss	Should have been day one

The lesson the team internalized: partition count is a day-one capacity decision with no day-two undo on Standard, and lag past retention is silent data loss. Over-provision partitions (they are nearly free), keep a high-cardinality key, alarm on derived lag against the retention window, and turn on Capture before you need it — so the next surprise is a replay, not a data-loss incident.

Advantages and disadvantages

The log-with-fan-out model both enables massive-scale ingestion and causes this class of silent failure. Weigh it honestly:

Advantages (why this model helps you)	Disadvantages (why it bites)
Massive throughput — millions of events/sec, scales with TUs/PUs	Capacity is a two-sided limit you size on egress, not ingress — easy to mis-size
Fan-out is free — every consumer group reads the full stream independently	Each consumer group adds a full copy of egress; fan-out quietly makes you egress-bound
Re-readable retention enables replay and multiple independent consumers	Lag past retention is silent data loss — no alarm, events just deleted on schedule
Per-partition ordering with a good key	Ordering only per partition; partition count caps parallelism, immutable on Standard
Capture is a near-free durable archive with no consumer to run	Avro-only; locking the namespace can silently break Capture’s storage path
Kafka endpoint lets existing producers/consumers lift-and-shift with no code change	No transactions / idempotent-EOS / Kafka Streams parity — some apps won’t port
Stream Analytics gives SQL-over-streams with exactly-once to key sinks	Exactly-once only to SQL/Cosmos/Delta — most sinks are at-least-once, you must dedup
Auto-inflate absorbs ingress spikes automatically	Auto-inflate is ingress-only and never scales down — egress throttling stays invisible

The model is right for high-volume telemetry, clickstream, log and IoT pipelines and for Kafka workloads you want to run without broker ops. It bites hardest on high-fan-out workloads (egress sizing), skewed keys (hot partitions), Standard deployments that under-provision partitions (the immutable trap), and anyone who treats Capture, checkpointing and lag monitoring as fire-and-forget. The disadvantages are all manageable — but only if you know they exist, which is the point of this article.

Hands-on lab

Stand up a Standard namespace, a hub with Capture, send and receive a keyed event, prove ordering, and watch Capture write Avro — all low-cost (Standard 1 TU + a storage account; delete at the end). Run in Cloud Shell (Bash).

Step 1 — Variables and resource group.

RG=rg-eh-lab
LOC=centralindia
NS=ehlab$RANDOM            # globally unique
EH=telemetry
SA=ehlablake$RANDOM       # globally unique, lowercase
az group create -n $RG -l $LOC -o table

Step 2 — Create a Standard namespace (1 TU, auto-inflate to 4).

az eventhubs namespace create -g $RG -n $NS -l $LOC \
  --sku Standard --capacity 1 \
  --enable-auto-inflate true --maximum-throughput-units 4 -o table

Expected: a namespace row with sku.name = Standard, isAutoInflateEnabled = true.

Step 3 — Create an ADLS Gen2 account for Capture.

az storage account create -g $RG -n $SA -l $LOC \
  --sku Standard_LRS --kind StorageV2 --hierarchical-namespace true -o table
az storage container create --account-name $SA -n capture --auth-mode login

Step 4 — Create the hub with 8 partitions and Capture on.

SAID=$(az storage account show -g $RG -n $SA --query id -o tsv)
az eventhubs eventhub create -g $RG --namespace-name $NS -n $EH \
  --partition-count 8 --retention-time-in-hours 24 \
  --enable-capture true --capture-interval 60 --skip-empty-archives true \
  --destination-name EventHubArchive.AzureBlockBlob \
  --storage-account "$SAID" --blob-container capture \
  --archive-name-format '{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}' \
  -o table

Expected: partitionCount = 8, captureDescription.enabled = true.

Step 5 — Grant yourself the data role and send a keyed event. Use the SDK or the Kafka endpoint; here, the simplest path is a connection string with Send/Listen (the lab uses a SAS rule, not for production):

az eventhubs eventhub authorization-rule create -g $RG --namespace-name $NS \
  --eventhub-name $EH -n labrule --rights Listen Send
CONN=$(az eventhubs eventhub authorization-rule keys list -g $RG --namespace-name $NS \
  --eventhub-name $EH -n labrule --query primaryConnectionString -o tsv)
echo "Connection string captured (do not commit)."

Send a few events with the same partition key using any SDK quickstart, setting PartitionKey = "device-42" on each. All of them must land on one partition.

Step 6 — Confirm ordering and read the partition. With a consumer, read the events and print PartitionId — every device-42 event shows the same partition id, proving the key→partition hash holds order.

Step 7 — Wait one capture interval, then list the Avro.

sleep 90    # one 60s interval + roll time
az storage blob list --account-name $SA -c capture --auth-mode login \
  --prefix "$NS/$EH/" -o table

Expected: Avro blobs under .../{PartitionId}/.../{Minute} for the partition you wrote to — and no empty files on idle partitions (because skip-empty-archives = true).

Step 8 — Verify configuration end to end.

az eventhubs eventhub show -g $RG --namespace-name $NS -n $EH \
  --query '{partitions:partitionCount, retentionHrs:retentionDescription.retentionTimeInHours, capture:captureDescription.enabled, skipEmpty:captureDescription.skipEmptyArchives}'
az eventhubs namespace show -g $RG -n $NS \
  --query '{sku:sku.name, capacity:sku.capacity, autoInflate:isAutoInflateEnabled, maxTU:maximumThroughputUnits, publicAccess:publicNetworkAccess}'

Step 9 — Teardown (stops all charges).

az group delete -n $RG --yes --no-wait

What each step proves, at a glance:

Step	Proves	If it fails, suspect
2	Namespace + auto-inflate config	Name not globally unique
3	ADLS Gen2 with hierarchical namespace	HNS flag omitted
4	Hub + Capture wiring	Wrong storage id / missing `{PartitionId}`
5–6	Keyed send → single partition (ordering)	Key not set; round-robin spread
7	Capture writes Avro, skips empty	Storage path/role; interval not elapsed
8	End-to-end config readback	Any of the above

Common mistakes & troubleshooting

This is the section you reopen mid-incident. Each failure mode is symptom → root cause → confirm (exact command/path) → fix. Scan the playbook table first, then read the detail for the row that matches.

#	Symptom	Root cause	Confirm (exact command / portal path)	Fix
1	Producers throttled, ingress looks fine	Egress-bound: fan-out × ingress > TUs	`ThrottledRequests > 0`; `OutgoingBytes >> IncomingBytes`	Size TUs on egress; raise TUs / Premium
2	Consumer lag grows during peak	Parallelism capped by partition count	Derive lag: `LastEnqueuedSequenceNumber − checkpoint` per partition	Scale to ≤ partition count; re-partition (new hub)
3	Report has gaps; dashboards green	Lag exceeded retention; events deleted	Lag trend vs `retentionTimeInHours`	Raise retention; fix lag; replay from Capture
4	No Avro files appear in ADLS	Capture can’t reach storage (PE/firewall)	`AzureDiagnostics ... ArchiveLogs` errors; empty container	PE on storage or trusted-services exception
5	Storage account 429s; latency spikes	Per-event checkpointing	Checkpoint frequency in code; storage throttling metric	Checkpoint in batches every N events
6	Restart replays the whole stream	Checkpoint never written / before processing	Restart reads from retention start	`UpdateCheckpointAsync` AFTER processing
7	Duplicate rows in the sink	At-least-once sink + no dedup	Sink type vs EOS matrix	EOS sink (SQL/Cosmos) or dedup on key
8	Empty Avro files everywhere, storage bill up	`skip-empty-archives` not set	List container; many 0-byte Avro	Set `--skip-empty-archives true`
9	Hot partition: one reader pinned at 100%	Low-cardinality partition key	Per-partition throughput skew	High-cardinality key; re-partition
10	ASA watermark delay climbs, results late	Under-provisioned SUs / not parallel	ASA SU% util + watermark delay metric	Add SUs; `PARTITION BY` the key
11	Kafka client errors on transactions	Endpoint doesn’t support Kafka txns/EOS	Client logs: txn/idempotent errors	Remove txn dependency or self-managed Kafka
12	“Private networking broke everything”	Public access off, no PE/DNS for clients	`publicNetworkAccess = Disabled`; no PE	Add PE + private DNS zone (servicebus)
13	Auto-inflate never fires under load	Auto-inflate is ingress-only	Throttling on egress while ingress low	Raise TUs manually; alarm on egress
14	Two apps fight over partitions	Shared consumer group	Same group name in two apps	One consumer group per downstream

Mistake 1 — Sizing TUs on ingress, then egress-throttling

You provision for “1 MB/s of telemetry,” but three consumer groups each read the full stream, so you actually need 3 MB/s of egress. The namespace throttles producers, latency climbs, and the ingress graph looks healthy the whole time.

Confirm. Compare incoming vs outgoing bytes and check throttling:

NSID=$(az eventhubs namespace show -g $RG -n $NS --query id -o tsv)
az monitor metrics list --resource "$NSID" \
  --metric ThrottledRequests IncomingBytes OutgoingBytes \
  --interval PT1M --aggregation Total -o table

OutgoingBytes materially exceeding IncomingBytes with non-zero ThrottledRequests is the signature. Fix: re-size TUs against ingress × consumer_groups / 2, raise the auto-inflate max, and if you are at the Standard ceiling, move to Premium. Auto-inflate will not save you here — it triggers on ingress.

Mistake 2 & 3 — Lag growth and silent data loss past retention

There is no built-in “lag” metric — you derive it. A growing gap between a partition’s last enqueued sequence number and your checkpoint means a consumer is falling behind; if that gap implies a time older than your retention window, events are being deleted before you read them.

Confirm. Per partition, compare last-enqueued vs checkpoint. From the SDK you read LastEnqueuedSequenceNumber off partition properties; the checkpoint lives in your Blob store. As a quick portal proxy, watch the retention window against how far behind the consumer is. The mechanism — not a fake metric — is the point: derive lag, then compare it to retentionTimeInHours.

Fix. Scale the consumer out up to the partition count (more instances beyond that sit idle); if you are already at the partition ceiling on Standard, the only fix is a side-by-side migration to a hub with more partitions (see the real-world scenario). Raise retention to buy recovery time, and replay from Capture to backfill what was lost. Alarm on derived lag relative to retention, not on an absolute number.

Mistake 4 & 8 — Capture broke, or wrote a pile of empty files

Two opposite Capture failures. Lock the namespace behind a private endpoint without giving the storage account a network path and Capture silently stops writing — no error to the producers, just no files. Conversely, leave skip-empty-archives at its default and every idle partition writes an empty Avro file every interval, inflating storage, transaction and scan costs.

Confirm. Check ArchiveLogs and list the container:

AzureDiagnostics
| where ResourceProvider == "MICROSOFT.EVENTHUB"
| where Category == "ArchiveLogs"
| where TimeGenerated > ago(1h)
| summarize count() by OperationName, ResultDescription
| order by count_ desc

az storage blob list --account-name $SA -c capture --auth-mode login --prefix "$NS/$EH/" -o table

No files (with traffic flowing) → network path broken. Many 0-byte files → empty-archive noise. Fix: for the broken path, give the storage account a private endpoint or enable the trusted Microsoft services firewall exception; for the noise, set --skip-empty-archives true.

Mistake 5 & 6 — Checkpointing wrong (too often, or in the wrong order)

Checkpoint per event and every checkpoint is a Blob write — the storage account throttles (429s) and checkpoint latency dominates. Checkpoint before processing (or never) and a crash either loses in-flight events or replays the entire stream on restart.

Confirm. A restart that re-reads from the start of retention means checkpoints aren’t being written (or are written before processing). Storage 429s on the checkpoint container point at per-event checkpointing. Fix: call UpdateCheckpointAsync after successful, idempotent processing, and in batches (every N events or a few seconds). Tune N against your idempotency budget — larger N means more replay on restart but less storage load.

Mistake 7 — Assuming exactly-once for every ASA sink

The ASA “exactly-once” badge applies to processing and to delivery only for SQL Database, Cosmos DB, and Delta-based outputs. Write to plain Blob/JSON, Event Hubs output, or Service Bus and you get at-least-once — duplicates after any retry.

Confirm. Duplicate rows in a non-EOS sink; cross-reference the sink against the exactly-once matrix earlier. Fix: route to an exactly-once sink (SQL/Cosmos with upsert on a business key) or dedup downstream on a unique key. Never assume the badge covers your output.

Mistake 9 — Hot partition from a skewed key

A low-cardinality key (region, tenant) pins most traffic to a few partitions; their single readers saturate while other partitions sit idle, and lag grows on the hot ones only.

Confirm. Per-partition throughput is wildly uneven — one or two partitions near their share, the rest near zero. Fix: switch to a high-cardinality key (deviceId); if the hub already exists on Standard, re-partition via a new hub. A composite key (region:deviceId) preserves locality while restoring spread.

Mistake 12 & 14 — Private-network and consumer-group footguns

Disabling public access without a private endpoint and DNS zone cuts off every client that resolved the public FQDN. Sharing one consumer group across two unrelated apps makes them steal partition ownership from each other, so neither reads reliably.

Confirm. publicNetworkAccess = Disabled with no PE → clients time out resolving. Two apps with the same group name → ownership churn in logs. Fix: add the private endpoint + privatelink.servicebus.windows.net DNS zone for the first; give every downstream its own consumer group for the second. For deeper network-path debugging, see Azure Troubleshooting Methodology: Network, VM, Identity, Storage.

Best practices

Twelve rules that keep an Event Hubs pipeline out of the incident channel:

#	Practice	Why it matters
1	Choose the tier before provisioning	Unit, limits, and partition-mutability all differ
2	Size TUs on egress (`ingress × groups`), not ingress	Fan-out makes you egress-bound almost always
3	Set the auto-inflate ceiling as a cost guardrail	Prevents an unbounded bill on a spike
4	Over-provision partitions (32+) on Standard	Immutable later; partitions are nearly free
5	Use a high-cardinality partition key	Even spread + per-entity ordering, no hot partitions
6	One consumer group per downstream	Avoids ownership theft between apps
7	`EventProcessorClient` + Blob checkpoint store	Don’t hand-roll ownership/load-balance
8	Checkpoint after processing, in batches, idempotently	No silent loss, no storage throttling
9	Enable Capture with `skip-empty-archives` + `{PartitionId}`	Free replay; no empty-file noise
10	Validate the Kafka endpoint against your client’s feature use	Transactions/EOS aren’t supported
11	ASA: `TIMESTAMP BY ... OVER <key>` + explicit watermark	Event-time correctness; substreams
12	Match the ASA sink to the required guarantee	Exactly-once only to SQL/Cosmos/Delta

Two more that are operational rather than configuration: alarm on derived lag relative to retention (not an absolute number), and re-derive your TU math every time you add a consumer group.

Security notes

Event Hubs security is identity, network, and encryption — and the defaults are not the production posture.

Control	Default	Production setting	How
Auth model	SAS connection strings	Entra ID + RBAC / managed identity	`Azure Event Hubs Data Sender` / `...Data Receiver` roles
Local (SAS) auth	Enabled	Disabled where possible	`--disable-local-auth true`
TLS	1.0+ allowed historically	Minimum TLS 1.2	`minimumTlsVersion: '1.2'`
Public network access	Enabled	Disabled (private-only)	`--public-network-access Disabled` + PE
Network isolation	Public FQDN	Private endpoint	`privatelink.servicebus.windows.net`
Encryption at rest	Microsoft-managed keys	CMK if compliance requires	Premium/Dedicated + Key Vault
Capture storage auth	Account key	Managed identity	Grant the namespace identity Storage Blob Data Contributor
Diagnostic logging	Off	On to Log Analytics	Diagnostic settings → OperationalLogs, ArchiveLogs

The RBAC data roles are the ones to know — they replace shared keys:

Role	Grants	Assign to
`Azure Event Hubs Data Owner`	Full data-plane (send + receive + manage)	Admins / break-glass
`Azure Event Hubs Data Sender`	Send only	Producer apps
`Azure Event Hubs Data Receiver`	Receive only	Consumer apps / ASA

# Grant a producer app's managed identity send-only, scoped to one hub
PRINCIPAL=$(az webapp identity show -n producer-app -g rg-app --query principalId -o tsv)
az role assignment create --assignee "$PRINCIPAL" \
  --role "Azure Event Hubs Data Sender" \
  --scope $(az eventhubs eventhub show -g $RG --namespace-name $NS -n $EH --query id -o tsv)

Prefer Entra ID over connection strings everywhere, scope roles to the hub (not the namespace) where you can, and disable local auth once every client is on managed identity. For secret-handling where a connection string is unavoidable, store it in Key Vault and rotate it — see Azure Key Vault: Secret Rotation with Managed Identity.

Cost & sizing

What drives the Event Hubs bill, and how to keep it honest:

Cost driver	Billed on	Rough figure (varies by region)	How to control
Throughput units (Standard)	Per-TU-hour	~₹2,000–2,500 / TU / month	Size on egress; auto-inflate ceiling; scale down off-peak
Ingress events	Per million events	small per-million	Batch sends; avoid tiny events
Capture (Standard add-on)	Per-TU-hour add-on	~₹0.10 / TU / hour	Worth it for replay; included on Premium
Capture storage	ADLS storage + transactions	per-GB + per-10k ops	`skip-empty-archives`; lifecycle to cool/archive
Processing units (Premium)	Per-PU-hour	substantial fixed/PU	Right-size PUs with load tests
Capacity units (Dedicated)	Per-CU-month	large fixed floor	Only at 100+ MB/s or compliance
Stream Analytics	Per streaming-unit-hour	per-SU-hour	Right-size SUs; stop idle jobs
Storage egress (replay)	Per-GB read	per-GB	Replay only what you need; date-prune

Right-sizing levers, ranked by impact:

Lever	Effect on bill	Effort	Risk
Size TUs on egress (not over-provision)	High	Low	Under-size → throttling; keep headroom
Auto-inflate ceiling	Medium	Trivial	Too low → throttling on spikes
Scheduled scale-down off-peak	Medium	Medium	Cron/automation to lower TUs
`skip-empty-archives`	Medium (storage)	Trivial	None
Lifecycle Avro to cool/archive	Medium (storage)	Low	Slower replay from archive tier
Standard vs Premium choice	High	Planning	Premium floor wasted at low volume
Right-size ASA SUs	Medium	Low	Too few → watermark delay

There is no free tier for Event Hubs Standard, but the lab above runs on 1 TU + LRS storage for a few rupees if you delete the resource group when done. A worked baseline: a 6-TU Standard namespace with Capture and one ASA job (small SU count) lands around ₹14,000–18,000/month — the Cartos scenario’s figure. Premium starts well above that as a fixed floor, justified only when you need predictable latency, >7-day retention, or tenancy isolation.

Interview & exam questions

Mapped to DP-203 (Data Engineering on Azure) and AZ-204/AZ-305 where messaging appears.

Why is Event Hubs egress capacity double its ingress per TU? Because every event is typically read by multiple consumer groups, and each group reads the full stream — so egress demand is ingress × consumer_groups. The 2:1 ratio assumes meaningful fan-out; you still size against the larger of your two numbers.
Can you change partition count after creating a hub? On Standard, no — it is immutable, and the only path to more partitions is a side-by-side migration to a new hub. On Premium/Dedicated you can increase it (never decrease), but existing data isn’t rebalanced and key mapping shifts for new sends.
Why checkpoint after processing rather than before? A checkpoint means “processed through offset X.” Checkpoint before processing and a crash loses every in-flight event between the checkpoint and the failure — silent data loss. After-processing checkpointing makes a restart resume correctly given idempotent work.
Event Hubs delivers at-least-once. What does that force on your consumer? Idempotent processing. After a crash you reprocess from the last checkpoint, so you may see events more than once; dedup on a business key or design effects to be replay-safe.
What does Capture write, and what format constraint does it impose? It archives the raw stream to Blob/ADLS as Avro only — no JSON/Parquet option. If your lake standard is Parquet, run a downstream conversion. Always enable skip-empty-archives and include {PartitionId} in the name format.
A consumer is scaled to 20 instances on a 4-partition hub. How many actively read? Four — at most one active reader owns a partition per consumer group, so 16 instances sit idle owning nothing. Parallelism is capped by partition count.
When does auto-inflate fail to help you? When you are egress-bound. Auto-inflate triggers on ingress throttling, so if consumer-group fan-out is your constraint, it may never fire while consumers throttle. Alarm on egress directly and raise TUs manually.
Which ASA sinks give true end-to-end exactly-once? Azure SQL Database, Cosmos DB, and Delta-based outputs (Synapse/Blob via the v2 path). Plain Blob/JSON, Event Hubs output, and Service Bus are at-least-once — the sink must dedup.
What’s the single highest-leverage clause in an ASA query at scale, and why? TIMESTAMP BY <ts> OVER <key> — it applies event-time per partition key (substreams), so one slow producer can’t hold the global watermark hostage, and it enables embarrassingly-parallel processing aligned to the input partitions.
How do you derive consumer lag without a built-in metric? Compare each partition’s LastEnqueuedSequenceNumber against your checkpointed sequence number; a growing gap is lag. Critically, compare that gap to the retention window — lag exceeding retention means permanent data loss.
What Kafka features does the Event Hubs endpoint NOT support? The transactions API, idempotent-producer EOS, Kafka-semantics compacted topics, and broker-level admin operations. It’s a clean lift-and-shift for produce/consume and consumer groups; not a drop-in for a transaction-dependent Kafka Streams app.
Why might “private networking” break Capture, and how do you fix it? Disabling public access cuts the namespace’s path to the storage account too. Give the storage account a private endpoint, or enable the trusted Microsoft services firewall exception, so Capture can keep writing Avro.

Quick check

You have 2 MB/s ingress and 4 consumer groups. How many TUs (whichever side dominates) before headroom?
True or false: increasing partition count is a simple knob on a Standard hub.
Where do you call UpdateCheckpointAsync relative to your processing logic, and why?
Which two Avro-path requirements must --archive-name-format satisfy for safe replay?
Name one ASA sink that is exactly-once and one that is at-least-once.

Answers

Egress dominates: 4 × 2 MB/s = 8 MB/s egress ÷ 2 = 4 TU (vs 2 TU by ingress). Size for 4 TU, then add headroom (e.g. 5–6).
False. Partition count is immutable on Standard; only Premium/Dedicated allow an increase (never decrease), and existing data isn’t rebalanced.
After successful, idempotent processing, in batches — so a crash never loses in-flight events and the Blob store isn’t throttled by per-event writes.
It must include {PartitionId} (for per-partition ordering on replay) and the {Namespace}/{EventHub} tokens; time tokens are optional but enable date/time pruning.
Exactly-once: Azure SQL Database (or Cosmos DB / Delta). At-least-once: plain Blob/JSON (or Event Hubs output / Service Bus).

Glossary

Term	Definition
Event Hubs	Managed, partitioned event-streaming service (a distributed append-only log) with Kafka-protocol compatibility.
Namespace	The container and capacity boundary for one or more hubs; TUs/PUs are shared here.
Event hub	A single partitioned log — the Kafka “topic” equivalent — you produce to and consume from.
Throughput unit (TU)	Standard capacity unit: 1 MB/s (or 1000 events/s) ingress, 2 MB/s (or 4096 events/s) egress.
Processing unit (PU)	Premium capacity reservation (CPU/memory) with predictable latency and no per-second throttle.
Capacity unit (CU)	Dedicated single-tenant capacity unit for the highest scale and isolation.
Partition	An ordered, append-only sub-log; ordering holds within it, and one reader owns it per consumer group.
Partition key	A value hashed to choose a partition, giving per-key ordering and (with high cardinality) even spread.
Consumer group	An independent cursor over the stream; each downstream gets its own.
Checkpoint	A persisted “processed through offset X” marker in a Blob store that makes restart resumable.
`EventProcessorClient`	The SDK client that distributes partition ownership, load-balances, and checkpoints.
Capture	The feature that auto-archives the raw stream to Blob/ADLS as Avro for replay and audit.
Auto-inflate	Automatic upward TU scaling (ingress-triggered) up to a configured ceiling; never scales down.
Kafka endpoint	The wire-compatible broker on port 9093 that lets Kafka clients connect with no code change.
Stream Analytics (ASA)	A SQL-over-streams engine with temporal windows and exactly-once delivery to specific sinks.
Watermark	How long ASA waits for late (straggler) events before closing a window.
Retention	How long events are kept in the hub; lag exceeding it is permanent data loss.

Next steps

Azure Cosmos DB: Partition-Key Design & RU Optimization — the same partitioning intuition, and a common exactly-once ASA sink.
Azure Data Factory, Synapse & Microsoft Fabric Deep Dive — batch/lakehouse processing of the captured Avro and Avro→Parquet conversion.
Azure Service Bus: Sessions, Dedup & Dead-Letter Patterns — when you need transactional messaging instead of a streaming firehose.
Backpressure & Flow Control in Streaming Systems — the general theory behind the throttling and lag dynamics here.
Azure Private Endpoints & Private DNS at Scale — locking the namespace onto a VNet without breaking Capture.