Azure Serverless

Azure Functions and Serverless Patterns: Event-Driven Compute

A team ran a webhook handler on a dedicated VM. The VM cost money every hour of every day even though the endpoint received a few thousand calls between 9am and 6pm and nothing overnight. Worse, the VM needed patching, the disk filled with logs nobody read, and a kernel update once took the endpoint down for forty minutes. Moving the handler to Azure Functions cut the compute bill by roughly 90%, removed the patching entirely, and — because the platform scaled the handler from zero to dozens of instances on its own — survived a traffic spike that would have toppled the single VM. That is the serverless trade in one story: you stop renting a machine and start renting executions, and you hand the platform the jobs (provisioning, scaling, patching, load-balancing) you used to do by hand.

Azure Functions is Azure’s Functions-as-a-Service offering: you write a function — a small piece of code with a single entry point — and declare the event that triggers it (an HTTP request, a queue message, a blob upload, a timer, a Cosmos DB change) plus the inputs and outputs it binds to (read a document, write to a queue, push to Event Hubs) declaratively, so you write logic, not client boilerplate. The platform runs your function only when its trigger fires, scales the number of concurrent instances to match the event rate, and — on the serverless plans — bills you per execution and per gigabyte-second of memory, dropping to zero when nothing is happening. This is the natural home for glue code, automation, event processing, lightweight APIs and scheduled jobs.

This article is the working reference a senior engineer keeps open. We go plan by plan (Consumption, Flex Consumption, Premium, Dedicated and Container Apps), trigger by trigger and binding by binding, through the scale controller that decides how many instances you get and the cold start that makes the first request slow, into concurrency and partitioning (the knobs that decide throughput and ordering), and across the Durable Functions patterns — function chaining, fan-out/fan-in, async HTTP, monitor, human interaction and aggregator — that let stateless functions run stateful, long-running workflows. Every concept carries the real limits (timeouts, payload sizes, instance caps), an az/Bicep snippet where it applies, and — because half of all Functions incidents are the same dozen mistakes — a symptom→cause→confirm→fix playbook. Read the prose once; keep the tables open when you are building or on call.

By the end you will know which plan to pick and what each one actually fixes, why your function fired twice and how to make that safe, why messages piled up in a poison queue at 2am, why a Premium plan still cold-started, and how to wire identity, networking and observability so the thing is production-grade rather than a demo that happened to ship.

What problem this solves

Most real work in a cloud system is event-shaped, not request-shaped. A file lands in storage and needs a thumbnail. An order message arrives on a queue and needs validating. A timer fires at 02:00 and a cleanup must run. A row changes in Cosmos DB and a downstream cache must update. A webhook calls in and a record must be written. None of these need a server sitting idle waiting; they need code that runs when the event happens and then stops. Running that code on always-on infrastructure (a VM, an always-warm App Service, a Kubernetes deployment) means paying 24/7 for capacity used a fraction of the time, plus owning the patching, scaling rules and load-balancing yourself.

Without serverless, the pain is concrete: you over-provision for the peak (a flash sale, a nightly batch) and waste money the other 23 hours; or you under-provision and the spike takes you down. You write the same connection-management, retry and dead-letter plumbing for every integration. You patch OS and runtime on a schedule that competes with shipping features. You build autoscaling rules and hope they react fast enough. And when traffic genuinely goes to zero overnight, you keep paying anyway.

Who hits this: anyone building integrations and automation (the classic “glue” between SaaS, queues, storage and databases), event processors (image/file pipelines, IoT and telemetry, change-feed reactors), lightweight or spiky APIs (webhooks, back-office endpoints, bursty public APIs), and scheduled jobs (reports, cleanups, syncs). Azure Functions removes the server from all of them — you provide the handler and the trigger, Azure provides everything else. But it is not a universal hammer: long-running compute, very low-latency APIs that cannot tolerate any cold start, and workloads that need persistent local state fit other models better, and a big part of using Functions well is knowing where its edges are.

The whole field, framed before the deep dive — the event source, the question it forces, and where Functions fits:

Workload shape What triggers it The serverless win When Functions is wrong
Webhook / lightweight API HTTP request Scale-to-zero; pay per call; no VM Strict sub-100 ms p99 with no cold-start tolerance
Event/stream processing Queue, Event Hubs, Service Bus, Event Grid Auto-scale to the backlog; built-in checkpointing Heavy stateful stream joins (use Stream Analytics/Flink)
File / blob pipeline Blob trigger / Event Grid on Storage Runs per file; fans out automatically Very high-rate blob events (prefer Event Grid source)
Scheduled job Timer (CRON) No always-on host for a nightly task Sub-second scheduling precision
Change reactor Cosmos DB / SQL change feed Reacts to data changes without polling Need transactional consistency across writes
Long workflow / orchestration Durable Functions Stateful, long-running, checkpointed Single sub-second synchronous call

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should be comfortable with the Azure basics: a resource group, an App Service plan vs a serverless plan, running az in Cloud Shell, reading JSON output, and the idea of a managed identity. Familiarity with HTTP, queues and JSON helps; you do not need deep Kubernetes or messaging-broker knowledge — we build it up. A function app always needs a backing storage account (it stores triggers’ state, the Durable task hub, and runtime metadata there), so a passing familiarity with Azure Storage account fundamentals is useful.

This sits in the Compute / Serverless track and is the event-driven sibling of the request-driven PaaS world. The decision of whether serverless functions are the right compute at all lives upstream in Azure App Service vs Container Apps vs AKS; read that first if you are still choosing a model. Once you are running Functions in production, the operational reflexes transfer directly from Troubleshooting Azure App Service: 502/503, Cold Starts & Restart Loops — the front-end/worker mental model and Application Insights workflow are the same. Functions almost always read secrets via Azure Key Vault: Secrets, Keys & Certificates and config via Azure App Configuration: Feature Flags, Dynamic Config & Key Vault References, and you observe them through Azure Monitor & Application Insights for Observability. When a function needs private outbound to a database, Azure Private Endpoint vs Service Endpoint is the networking decision it forces.

A quick map of who owns what when a function misbehaves, so you escalate to the right place fast:

Layer What lives here Who usually owns it Failure classes it causes
Event source (queue/hub/blob) Messages, partitions, backlog App / platform team Backlog growth, duplicate delivery, ordering
Trigger + scale controller Instance count decision, polling Microsoft (platform) Slow scale-out, no scale (host down), cold start
Function host (runtime) Your code, bindings, concurrency App / dev team Crash, timeout, throttled downstream, OOM
Backing storage account Trigger state, Durable task hub App + platform Host won’t start, Durable stalls, throttling
Identity & config Managed identity, KV refs, settings App + platform Boot failure, 403 to dependencies
Network (VNet / PE) Outbound to DB/PaaS, DNS Platform + network Timeouts, name-resolution failures

Core concepts

Six mental models make every later section obvious.

A function is a handler plus a trigger. The unit of work is a function: one entry point with exactly one trigger (the event that starts it) and zero or more bindings (declarative inputs and outputs). One or more functions live inside a function app, which is the deployment, scaling and configuration boundary — the function app is what you create, scale, give an identity, and put on a plan. All functions in an app share the app’s plan, settings, identity and storage account.

The trigger defines the contract; bindings remove the boilerplate. A trigger delivers an event payload and starts execution (an HTTP request body, a queue message, a blob stream). An input binding hands your function data pulled from a service before it runs (a Cosmos DB document keyed off the trigger); an output binding writes your function’s return value to a service after it runs (append to a queue, upsert a document). Bindings are declared in attributes/decorators or function.json, so the SDK manages the client, connection and serialization for you — you read and write parameters, not SDK objects.

The platform decides scale; you decide concurrency. Azure does not run your function on a server you manage. A component called the scale controller watches each trigger’s signal (HTTP request rate, queue length, Event Hubs lag) and adds or removes instances (worker sandboxes) to keep up — from zero to the plan’s maximum. Within each instance, concurrency settings decide how many invocations run at once. Scale (instances) is the platform’s job; concurrency (per-instance parallelism) is yours, and the two multiply into throughput.

Stateless by default; stateful on purpose. A plain function is stateless — it must not rely on in-memory state surviving between invocations, because the next invocation may run on a different instance (or the instance may have been recycled). State lives outside the function (a database, a queue, a cache). When you genuinely need stateful, long-running coordination — “call A, then B, wait for approval, then C” running for minutes, hours or days — Durable Functions provides it via an orchestrator that checkpoints its progress to storage and replays deterministically.

Delivery is at-least-once; design for it. Queue, Service Bus, Event Hubs and Event Grid triggers deliver at least once — under retries, redelivery or scale events, your function can see the same message more than once and events can arrive out of order. This is not a bug to fix; it is a property to design around with idempotency (processing the same event twice has the same effect as once) and poison/dead-letter handling (a message that keeps failing is set aside, not retried forever).

No server, but always a storage account. “Serverless” means you do not manage servers — it does not mean there is no state. Every function app is bound to a storage account (the AzureWebJobsStorage connection) that holds runtime metadata, trigger leases/checkpoints, the Durable task hub, and (for some plans) the deployment package. If that storage account is unreachable, throttled, or its keys rotate without updating the setting, the host fails to start — a surprising amount of “Functions is down” is really “the storage account is unhappy.”

The vocabulary in one table

Pin down every moving part before the deep sections. The glossary repeats these for lookup; this is the mental model side by side:

Term One-line definition Where it lives Why it matters
Function One handler with one trigger + bindings Inside a function app The unit of execution and billing
Function app Deployment/scaling/config boundary On a plan What you create, scale, give identity
Trigger The event that starts a function Per function Defines payload + scaling signal
Binding Declarative input/output to a service Per function Removes client boilerplate
Hosting plan Where/how the app runs and is billed Per function app Decides scale, cold start, cost, networking
Scale controller Platform component that adds/removes instances Microsoft-managed Decides how fast you scale to load
Instance A worker sandbox running your app On the plan Cold start happens when a new one spins up
Concurrency Invocations running at once per instance host.json / settings Throughput vs downstream pressure
Cold start First-request latency on a fresh instance Instance lifecycle Slow first call; mitigated, not eliminated
AzureWebJobsStorage The app’s backing storage connection App setting Host won’t start if it’s broken
Durable Functions Stateful orchestration on top of Functions Extension + task hub Long-running, checkpointed workflows
Task hub Durable’s state store (queues + tables) In the storage account Where orchestration progress is persisted
Poison / dead-letter Where repeatedly-failing messages go Queue/Service Bus Stops infinite retry of a bad message
Managed identity The app’s Entra identity for auth On the function app Passwordless access to KV/DB/Storage

Hosting plans: pick the one that fits, not the cheapest by default

The single highest-leverage decision is the hosting plan. It determines how the app scales, whether it ever scales to zero, how cold starts behave, what networking it can do, the maximum timeout, and how you are billed. Picking “Consumption because it’s cheapest” and then fighting cold starts and VNet limits for a month is the most common early mistake.

There are five plans in practice. Consumption is the original serverless plan: scale-to-zero, pay per execution, modest cold starts, a hard 10-minute timeout. Flex Consumption is the modern serverless plan: scale-to-zero and fast per-instance concurrency control, VNet integration, alwaysReady instances to kill cold starts, and per-instance memory you choose — it is the default new-build recommendation. Premium (Elastic Premium, EP) gives pre-warmed instances (no cold start), VNet integration, longer/unbounded timeouts and more memory, billed per vCPU/GB allocated. Dedicated (App Service plan) runs Functions on a plan you already pay for (good for steady load or co-locating with web apps), with no scale-to-zero. Container Apps hosts a containerized function app on the Container Apps/KEDA platform when you want microservices, Dapr, or container parity.

Lay the five plans side by side on the axes that actually decide the choice:

Plan Scale-to-zero Cold starts Max timeout VNet integration Billing model Best for
Consumption Yes Yes (modest) 5 min default, 10 min max No (legacy: limited) Per-execution + GB-s Spiky/low-traffic glue, demos
Flex Consumption Yes Yes — killed with alwaysReady Configurable, long Yes (built-in) Per-execution + GB-s + alwaysReady New serverless builds (default)
Premium (EP1–EP3) No (min 1) None (pre-warmed) Unbounded (default 30 min) Yes Per vCPU/GB allocated (always-on) Steady + need warm + VNet + long runs
Dedicated (App Service) No Per App Service rules Unbounded (Always On) Yes App Service plan (instance-hours) Co-locate with web apps; predictable load
Container Apps Yes (to 0 via KEDA) Yes (scale-from-zero) Long (revision-based) Yes vCPU/GB per second Containers, Dapr, microservices parity

The same plans as a capability grid against the features people actually need:

Capability Consumption Flex Consumption Premium (EP) Dedicated Container Apps
Scale to zero Yes Yes No No Yes
Pre-warmed / always-ready No Yes (alwaysReady) Yes (pre-warmed count) n/a (Always On) No (min replicas)
VNet integration No Yes Yes Yes Yes
Per-instance concurrency control Limited Yes Yes Yes Yes (KEDA)
Choose instance memory No Yes Yes (EP SKU) Yes (SKU) Yes
Unbounded execution time No (10 min) Long Yes Yes Long
Deployment slots No (evolving) Yes Yes Revisions
Linux + Windows Both Linux Both Both Linux (containers)

And the decision as a table — match what you’re feeling to the plan that fixes it:

If you need… Because… Pick
Cheapest possible for bursty/low traffic You pay nothing at idle Consumption (or Flex)
Scale-to-zero plus no cold start plus VNet Modern serverless, private deps Flex Consumption
Zero cold start with steady load and long runs Latency-sensitive, > 10 min jobs Premium (EP)
To run on a plan you already pay for Co-located web apps, steady load Dedicated
Container image, Dapr, or K8s-style ops Microservice parity Container Apps
Strict isolation / dedicated tenancy Compliance, ASE-style Premium on ASE / Dedicated

Create a Flex Consumption app (the modern default) with az:

RG=rg-fn-prod
LOC=centralindia
STG=stfnprod$RANDOM            # storage account (globally unique)
APP=fn-orders-prod-$RANDOM     # function app (globally unique)

az group create -n $RG -l $LOC -o table
az storage account create -n $STG -g $RG -l $LOC --sku Standard_LRS -o table

# Flex Consumption: choose runtime, version, instance memory, and region
az functionapp create -n $APP -g $RG \
  --storage-account $STG \
  --flexconsumption-location $LOC \
  --runtime dotnet-isolated --runtime-version 8.0 \
  --instance-memory 2048 \
  -o table

The equivalent in Bicep, with system-assigned identity and an alwaysReady instance to remove cold start on the HTTP path:

resource plan 'Microsoft.Web/serverfarms@2023-12-01' = {
  name: 'flex-orders'
  location: location
  sku: { tier: 'FlexConsumption', name: 'FC1' }
  properties: { reserved: true } // Linux
}

resource fnApp 'Microsoft.Web/sites@2023-12-01' = {
  name: 'fn-orders-prod'
  location: location
  kind: 'functionapp,linux'
  identity: { type: 'SystemAssigned' }
  properties: {
    serverFarmId: plan.id
    functionAppConfig: {
      runtime: { name: 'dotnet-isolated', version: '8.0' }
      scaleAndConcurrency: {
        instanceMemoryMB: 2048
        maximumInstanceCount: 100
        alwaysReady: [ { name: 'http', instanceCount: 1 } ] // warm pool for HTTP
      }
      deployment: {
        storage: {
          type: 'blobContainer'
          value: '${stg.properties.primaryEndpoints.blob}deployments'
          authentication: { type: 'SystemAssignedIdentity' }
        }
      }
    }
  }
}

Runtime, language and worker model

Independent of plan, you pick a runtime stack and version. .NET has two models: isolated worker (your function runs in its own process out-of-proc from the host — the recommended model, decoupled from the host’s .NET version) and the legacy in-process model (being retired). The other stacks — Node.js, Python, Java, PowerShell — always run out-of-process via the language worker. Pick the version deliberately: an unsupported runtime version blocks deploys and security updates.

Stack Models / notes Trigger style When to pick
.NET (isolated) Out-of-proc; decoupled from host Attributes New .NET builds (recommended)
.NET (in-process) Legacy; tied to host version Attributes Existing apps only; migrate off
Node.js (v4 model) Code-first programming model app.http(...) etc. JS/TS teams, fast iteration
Python (v2 model) Decorator-based @app.route etc. Data/ML glue, scripting
Java Annotations @FunctionName JVM shops, Spring-adjacent
PowerShell Scripting function.json Ops automation, Azure mgmt
Custom handler Any language over HTTP Custom handler contract Go/Rust/other; container only realistically

Triggers: the event that starts a function

Every function has exactly one trigger. The trigger decides the payload shape, the scaling signal the controller watches, the delivery guarantee, and the failure/retry behaviour. Knowing each trigger’s real limits is the difference between a pipeline that holds under load and one that silently drops or duplicates.

The full trigger catalogue, with the property that bites:

Trigger Fires on Delivery guarantee Scaling signal Key limit / gotcha
HTTP Inbound HTTP request Synchronous (caller-driven) Request rate Response within timeout; large bodies via stream
Timer CRON schedule (NCRONTAB) Singleton (one instance) Time Missed runs on restart unless RunOnStartup; 6-field CRON incl. seconds
Queue Storage New message in a queue At-least-once Queue length 64 KB message; 5 dequeues → poison queue
Service Bus Message in queue/subscription At-least-once Active message count Lock duration; sessions for ordering; 256 KB/1 MB (Premium)
Event Hubs Event batch on a partition At-least-once Partition lag (lease) One instance per partition; checkpointing; ordering per partition
Event Grid Discrete event (HTTP push) At-least-once Event push Handshake validation; retries with backoff; dead-letter to blob
Blob (polling) New/updated blob At-least-once (eventual) Scan / receipts High latency at scale → use Event Grid source
Blob (Event Grid) Blob event via Event Grid At-least-once Event push Near-real-time; the production choice for blobs
Cosmos DB Change feed (inserts/updates) At-least-once Lease lag Needs a lease container; no deletes in feed
Durable orchestration Orchestrator/activity/entity Internal (replay) Control queue Determinism rules; managed by the extension

HTTP trigger

The HTTP trigger turns a function into a web endpoint. It is synchronous — the caller waits for your response — so the request must complete within the platform/front-end timeout (about 230 seconds at the load balancer, far less than the function timeout). Configure the route, methods, and authorization level (the function-key model): anonymous (no key), function (per-function key), admin (host key). For real auth, put Easy Auth/Entra ID or API Management/Application Gateway in front rather than relying on function keys alone.

# Read a function's invoke URL and (default) key
az functionapp function show -g $RG -n $APP --function-name HttpOrders \
  --query "invokeUrlTemplate" -o tsv
Setting Values Default When to change Gotcha
authLevel anonymous / function / admin function anonymous behind APIM/Entra Keys are not real auth; rotate them
methods GET/POST/PUT/… GET, POST Restrict to what you accept Over-permissive methods = attack surface
route template e.g. orders/{id} function name Clean REST routing Route collisions return 404
Response timeout bounded by LB ~230 s Long work → return 202 + async Don’t block; use Durable async pattern
Max request body streamable; ~100 MB practical Large uploads Buffer vs stream; memory pressure

Timer trigger

A timer fires on a NCRONTAB schedule — a six-field CRON that includes seconds ({second} {minute} {hour} {day} {month} {day-of-week}). It is a singleton: only one instance runs the timer (coordinated via a storage lock), so a scaled-out app does not fire the timer N times. Missed occurrences (host was down) are not back-filled unless you opt in; set RunOnStartup only for development — it fires on every restart/scale event, which can surprise you.

// .NET isolated: every day at 02:00:00 (note the leading seconds field)
[Function("NightlyCleanup")]
public void Run([TimerTrigger("0 0 2 * * *")] TimerInfo timer) { /* ... */ }
CRON example Meaning
0 */5 * * * * Every 5 minutes
0 0 * * * * Every hour, on the hour
0 0 2 * * * Every day at 02:00
0 30 9 * * 1-5 09:30, Monday–Friday
*/30 * * * * * Every 30 seconds
0 0 0 1 * * Midnight on the 1st of each month

Queue Storage trigger

Fires when a message lands in an Azure Storage queue. Delivery is at-least-once; a message that fails processing is retried up to 5 times (default maxDequeueCount), then moved to a poison queue named <queue>-poison. Messages are capped at 64 KB (base64 ~48 KB of payload) — for larger payloads, store the blob and queue a pointer. Tune batch size and concurrency in host.json.

{
  "extensions": {
    "queues": {
      "batchSize": 16,
      "newBatchThreshold": 8,
      "maxDequeueCount": 5,
      "visibilityTimeout": "00:00:30",
      "maxPollingInterval": "00:00:02"
    }
  }
}
Setting What it does Default Trade-off
batchSize Messages fetched per instance at once 16 Higher = throughput, more memory/downstream load
newBatchThreshold Refill trigger (fetch more when below) batchSize/2 Controls steady-state concurrency
maxDequeueCount Retries before poison queue 5 Lower = fail fast; higher = ride transient errors
visibilityTimeout How long a message is hidden while processing 0 Too short = duplicate processing
maxPollingInterval Backoff when the queue is empty 1 min Lower = faster pickup, more storage transactions

Service Bus trigger

For enterprise messaging — ordering (sessions), dead-lettering, transactions, topics/subscriptions — use Service Bus rather than Storage queues. Delivery is at-least-once with a lock (PeekLock): the message is locked while you process it, and you must finish before the lock duration expires or it’s redelivered. Use sessions for FIFO ordering within a key. Failed messages go to the built-in dead-letter sub-queue after maxDeliveryCount. Standard tier caps messages at 256 KB, Premium at 1 MB (or 100 MB with large-message support).

{
  "extensions": {
    "serviceBus": {
      "maxConcurrentCalls": 16,
      "maxConcurrentSessions": 8,
      "prefetchCount": 0,
      "autoCompleteMessages": true,
      "maxAutoLockRenewalDuration": "00:05:00"
    }
  }
}
Setting What it does Default When to change
maxConcurrentCalls Parallel non-session messages per instance 16 Lower to protect a fragile downstream
maxConcurrentSessions Parallel sessions per instance 8 Tune for ordered-stream fan-out
prefetchCount Messages cached locally ahead of processing 0 Higher = throughput, risk of lock expiry
autoCompleteMessages Auto-complete on success true Set false for manual settlement control
maxAutoLockRenewalDuration Auto-renew the lock for long work 5 min Raise for long handlers; cap to avoid stuck locks

Event Hubs trigger

For high-throughput telemetry/streaming, Event Hubs partitions the stream; the trigger assigns one instance per partition (via leases) and processes events in batches, checkpointing progress so a restart resumes where it left off — but a redelivered batch after a crash means at-least-once and possible reprocessing. Ordering is per-partition only. Max parallelism equals the partition count, so partitions are your scale ceiling — size them up front (they’re hard to change later).

{
  "extensions": {
    "eventHubs": {
      "maxEventBatchSize": 100,
      "batchCheckpointFrequency": 1,
      "prefetchCount": 300
    }
  }
}
Concept What it controls Limit / note
Partition count Max concurrent instances Set at creation; 1–32 (more on Premium/Dedicated)
maxEventBatchSize Events per invocation Bigger batch = throughput, larger memory
batchCheckpointFrequency Batches between checkpoints Higher = fewer storage writes, more reprocessing on crash
Throughput units / PUs Ingress/egress capacity TU on Standard; CUs on Dedicated
Ordering FIFO per partition only No global ordering across partitions

Event Grid, Blob and Cosmos DB triggers

Event Grid delivers discrete events over HTTP push (Storage events, custom events, system topics). It validates the endpoint with a handshake, retries with exponential backoff on failure, and dead-letters to a blob container after the retry window. It is the right way to react to blob events at scale.

Blob trigger has two modes. The legacy polling mode scans the container and tracks receipts — simple but with high latency at scale (minutes) and a risk of missing events on very high churn. The production choice is Event Grid-based blob events, which push near-real-time and don’t degrade with container size.

Cosmos DB trigger consumes the change feed (inserts and updates, not deletes) using a lease container to track progress across partitions; like Event Hubs it scales with the source’s physical partitions and delivers at-least-once.

Trigger Latency Scaling unit Critical gotcha
Event Grid Near-real-time Event push (parallel) Must answer validation handshake (200)
Blob (polling) Minutes at scale Container scan Misses/lags on high churn — avoid in prod
Blob (Event Grid) Seconds Event push Requires Event Grid + storage event subscription
Cosmos DB change feed Seconds Source partitions Needs a lease container; no deletes; not transactional

Bindings: read and write services without the client code

A binding connects your function to a service declaratively. An input binding supplies data before your function runs; an output binding writes your return value after. The trigger is itself a special binding (direction in, trigger). Bindings cover most Azure data services and remove the connect/auth/serialize/dispose boilerplate — but they trade flexibility for convenience, and for anything fancy (transactions, custom retry, streaming) you still use the SDK directly.

// .NET isolated: triggered by a queue message, read a Cosmos doc, write to another queue
[Function("EnrichOrder")]
[QueueOutput("orders-enriched")]                                   // output binding
public string Run(
    [QueueTrigger("orders-in")] string orderId,                    // trigger
    [CosmosDBInput("shop","orders", Id="{orderId}", PartitionKey="{orderId}")] Order order) // input
{
    order.Enriched = true;
    return JsonSerializer.Serialize(order);
}

The bindings you reach for, and the direction(s) each supports:

Binding In Out Trigger Typical use
HTTP Yes Web endpoints
Timer Yes Schedules
Queue Storage Yes Yes Yes Lightweight work queues
Service Bus Yes Yes Enterprise messaging
Event Hubs Yes Yes Streaming / telemetry out
Event Grid Yes Yes Event publishing
Blob Storage Yes Yes Yes File read/write
Table Storage Yes Yes Cheap key-value state
Cosmos DB Yes Yes Yes Documents + change feed
SQL (Azure SQL) Yes Yes Yes Relational read/write/feed
SignalR Service Yes Yes Real-time push to clients
Durable client/entity Yes Yes Yes Start/query orchestrations

Two binding pitfalls worth knowing before you ship:

Pitfall What happens Fix
Binding expression typo ({orderId} vs {OrderId}) Binding resolves empty → null arg → crash Match the trigger property name exactly (case-sensitive)
Output binding never written Silent no-op (you returned but didn’t set it) Return the bound value, or use IAsyncCollector.AddAsync
Connection setting missing Binding can’t auth → host error at load Set <Name>__serviceUri/connection app setting (identity-based preferred)
Large payload through a binding Memory pressure, timeout Stream via SDK; pass a pointer, not the blob

Scaling and cold starts: the part everyone underestimates

On the serverless plans the scale controller is a platform component that watches each trigger’s signal and decides how many instances to run — from zero to the plan maximum. It reacts differently per trigger: HTTP scales on request rate/latency, queues scale on queue length, Event Hubs/Cosmos scale on partition lag, and the controller adds instances in steps (it won’t go from 0 to 200 in one tick). This is why a sudden burst sees a brief ramp, and why a queue that suddenly gets 100k messages drains over a minute or two rather than instantly.

A cold start is the latency the first request on a fresh instance pays: the platform allocates a sandbox, mounts your app, starts the language worker, JITs/loads your code, and primes connections — typically 1–10+ seconds depending on stack, package size and dependencies. It bites whenever an instance is created: scaling out, scaling back up from zero, or after a recycle. On Consumption you cannot avoid it entirely; Flex Consumption offers alwaysReady instances (a warm pool that’s always running for a given group); Premium keeps pre-warmed instances so scale-out never exposes a cold worker; Dedicated stays warm because it never scales to zero.

How each plan handles scale and cold start:

Plan Scales from 0 Cold-start exposure Warm mechanism Max instances (typical)
Consumption Yes On every new instance None ~200
Flex Consumption Yes Only when above alwaysReady count alwaysReady warm pool High (configurable cap)
Premium (EP) No (min 1) None Pre-warmed instance count Up to ~20–100 by SKU
Dedicated No Per App Service (Always On) Always On Plan instance cap
Container Apps Yes (KEDA) On scale-from-zero Min replicas > 0 Replica cap

What actually eats the cold-start budget, and how to cut each:

Cold-start cost Typical magnitude Reduce it by Trade-off
Sandbox + mount 0.5–2 s alwaysReady/pre-warmed; smaller package Costs warm capacity
Language worker start 0.3–3 s Lighter runtime; .NET isolated trimming Build complexity
Dependency load / DI 0.5–5 s Fewer/lighter packages; lazy init First real call still primes
First connection (DB/KV) 0.2–3 s Reuse clients (static); pooled drivers Must be singleton-safe
Package pull (large zip/container) 1–30 s Run-from-package; small image; same-region Build discipline

Set Flex alwaysReady and Premium pre-warmed counts:

# Flex Consumption: keep 2 instances of the 'http' group always warm
az functionapp scale config set -g $RG -n $APP \
  --always-ready-instances http=2

# Premium (EP): pre-warm 3 instances + raise the elastic maximum
az functionapp plan update -g $RG -n premium-plan \
  --min-instances 1 --max-burst 20
az resource update --resource-group $RG \
  --name $APP --resource-type "Microsoft.Web/sites" \
  --set properties.preWarmedInstanceCount=3

The scale knobs by plan, and what each caps:

Knob Plan What it sets Default Why change
maximumInstanceCount Flex Upper bound on instances plan default Protect a downstream; cap cost
alwaysReady Flex Warm instances per group 0 Kill cold start on hot paths
preWarmedInstanceCount Premium Buffer instances before traffic 1 Cover scale-out latency
minimumElasticInstanceCount Premium Always-running floor 1 Steady warm baseline
functionAppScaleLimit Consumption/Premium Hard instance cap none Stop runaway scale to a fragile dep
WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT Consumption Per-app scale cap platform Limit a single app’s footprint

Concurrency, batching and partitioning: throughput vs ordering

Scale (instances) multiplies with concurrency (parallel invocations per instance) to give throughput. Push concurrency too high and you overwhelm a downstream (a database hits connection limits, an API throttles); too low and you under-use each instance and pay for more instances than you need. Each trigger family has its own concurrency model, and a few share the dynamic concurrency feature (the host auto-tunes concurrency from observed success/latency).

The concurrency model per trigger, and the lever:

Trigger Concurrency lever Where set Ordering implication
HTTP Instances × in-process parallelism platform / host None (stateless)
Queue Storage batchSize + newBatchThreshold host.json None (no ordering)
Service Bus (no session) maxConcurrentCalls host.json None
Service Bus (sessions) maxConcurrentSessions host.json FIFO within a session
Event Hubs partitions × batch size hub + host.json FIFO within a partition
Cosmos DB source partitions × lease Cosmos + lease Per-partition
Durable maxConcurrentActivityFunctions / orchestrations host.json Orchestrator-controlled

Enable dynamic concurrency to let the host find the sweet spot under variable load:

{
  "concurrency": {
    "dynamicConcurrencyEnabled": true,
    "snapshotPersistenceEnabled": true
  }
}

The ordering-vs-throughput trade, stated plainly:

You want… Mechanism Cost
Maximum throughput, order irrelevant High concurrency, many partitions/instances Downstream pressure; must be idempotent
Strict ordering within a key Service Bus sessions or Event Hubs partition key Throughput capped by key/partition count
Even fan-out, no hot key Good partition-key design (high cardinality) Lose per-key ordering
Protect a fragile downstream Cap maxConcurrentCalls / functionAppScaleLimit Slower drain; possible backlog

A worked sizing example: an Event Hub with 8 partitions gives at most 8 concurrent instances for that trigger, regardless of how many messages pile up — if each instance processes a batch of 100 in 200 ms, your ceiling is ~4,000 events/sec. Need 40,000/sec? You need ~80 partitions (or fewer with bigger batches and faster handlers). The partition count, chosen at creation, is your scale ceiling — this is the single most common Event Hubs capacity mistake.

Durable Functions: stateful workflows on stateless compute

Plain functions are stateless and short-lived; many real processes are stateful and long-running — “validate, charge, ship, notify, and if anything fails, compensate,” running over minutes to days, surviving restarts. Durable Functions is an extension that adds this without a separate workflow engine. You write an orchestrator function (which coordinates) that calls activity functions (which do the work), and the framework checkpoints the orchestrator’s progress to the task hub in storage. When the orchestrator awaits, the platform can unload it entirely (you pay nothing while it waits hours for an approval) and later replay the orchestrator function from the start, using the checkpointed history to skip already-completed steps — which is why orchestrator code must be deterministic (no DateTime.Now, no random, no direct I/O; use the context’s APIs).

The three function types in Durable, and the rules each obeys:

Type Role Constraints Example
Orchestrator Coordinates the workflow Deterministic: no I/O, no clocks/random, no await except on durable APIs “call A → B → wait → C”
Activity Does the actual work Any code, side effects allowed Charge a card, send email
Entity Stateful actor (small state) Single-threaded per entity key A per-user counter, a cart
Client Starts/queries orchestrations Triggered by HTTP/queue/etc. Webhook that kicks off a flow

The orchestration patterns

The patterns are the reason Durable exists. Each solves a class of coordination problem cleanly:

Pattern Problem it solves Mechanism
Function chaining Run steps in strict sequence (output → input) await ctx.CallActivityAsync in order
Fan-out / fan-in Parallelize N items, then aggregate Start N activities, await Task.WhenAll
Async HTTP API Long job behind a quick HTTP 202 + status URL Client starts orchestration, returns status endpoint
Monitor Poll a resource until a condition, with timeout Orchestrator loops with CreateTimer
Human interaction Wait for approval/input (minutes–days) WaitForExternalEvent + timeout
Aggregator (entities) Accumulate state from many events, single-threaded Durable Entities

Fan-out/fan-in — process every line of an order in parallel, then reconcile:

[Function("ProcessOrder")]
public static async Task<OrderResult> Run(
    [OrchestrationTrigger] TaskOrchestrationContext ctx)
{
    var order = ctx.GetInput<Order>();

    // Fan out: one activity per line item, all in parallel
    var tasks = order.Lines.Select(line =>
        ctx.CallActivityAsync<LineResult>("ProcessLine", line)).ToList();

    // Fan in: wait for all, then aggregate
    LineResult[] results = await Task.WhenAll(tasks);
    return new OrderResult(results);
}

Human-interaction with a timeout (approve within 72 hours or escalate):

using var cts = new CancellationTokenSource();
DateTime deadline = ctx.CurrentUtcDateTime.AddHours(72);       // deterministic clock
Task timeout = ctx.CreateTimer(deadline, cts.Token);
Task<bool> approval = ctx.WaitForExternalEvent<bool>("ApprovalEvent");

if (approval == await Task.WhenAny(approval, timeout)) {
    cts.Cancel();
    if (approval.Result) await ctx.CallActivityAsync("Ship", order);
} else {
    await ctx.CallActivityAsync("Escalate", order);             // timed out
}

The determinism rules — break one and you get non-deterministic replay (the classic Durable bug):

Don’t (in an orchestrator) Why Do instead
DateTime.Now / DateTimeOffset.UtcNow Different value on replay ctx.CurrentUtcDateTime
Guid.NewGuid() / random Non-deterministic ctx.NewGuid()
Direct HTTP / DB / file I/O Side effects re-run on replay Call an activity that does the I/O
Task.Delay / Thread.Sleep Not durable; blocks ctx.CreateTimer(...)
await non-durable tasks Breaks the replay model Only await durable APIs
Static mutable state Leaks across replays/instances Pass state through the orchestrator

Durable behaviours and limits you should size for:

Aspect Behaviour Limit / note
Task hub storage Queues + tables in the storage account Throttling here stalls all orchestrations
History growth Each step appends to history Use ContinueAsNew for eternal/long loops
Concurrency maxConcurrentActivityFunctions etc. Tune in host.json to protect downstreams
Backend choice Azure Storage (default), Netherite, MSSQL Netherite/MSSQL for high throughput
Versioning In-flight orchestrations pin to old code Don’t break history shape on deploy
Sub-orchestrations Orchestrators calling orchestrators Compose large workflows; mind history size

The reference architecture for a serverless order workflow that combines several of these patterns is in Reference Architecture: Serverless API on Azure.

Idempotency, retries and poison messages: designing for at-least-once

Because every messaging trigger delivers at least once, a correct function must produce the same result whether it sees a message once or five times — that’s idempotency. The realistic failure flow: your function pulls a message, does half the work, then crashes (or its lock/visibility expires); the message becomes visible again and is redelivered; without idempotency you double-charge a card or write a duplicate row.

The idempotency techniques, and when each fits:

Technique How it works Best for
Idempotency key (dedup store) Record a unique message id; skip if seen Side-effecting writes (charge, email)
Upsert by natural key Write is “set to X” not “add X” Database records
Conditional write (ETag/If-Match) Reject if state changed underneath Optimistic concurrency
Idempotent downstream The API itself dedups on a key Payment providers with idem keys
Exactly-once via transaction Settle message + write atomically Service Bus + DB (sessions/Tx)

Retries: the host has a retry policy (fixed or exponential) for trigger-level retries, plus the source’s own redelivery (queue dequeue count, Service Bus delivery count). After retries are exhausted, the message is poisoned/dead-lettered — moved aside so it stops blocking the queue. You must monitor and drain these, or failures pile up silently.

[Function("ChargeOrder")]
[FixedDelayRetry(5, "00:00:10")]              // host retry: 5 attempts, 10 s apart
public async Task Run([ServiceBusTrigger("orders","charge")] OrderMessage msg)
{ /* idempotent charge */ }

The delivery/retry mechanics per source, and where the failed message ends up:

Source Redelivery counter Default before set-aside Set-aside destination
Queue Storage dequeueCount 5 <queue>-poison
Service Bus DeliveryCount 10 (maxDeliveryCount) Built-in dead-letter sub-queue
Event Hubs (no per-message DLQ) n/a — checkpoint advances None — handle in code or sideline
Event Grid retry schedule ~24 h window Dead-letter blob container
Cosmos DB lease retry per host policy None — handle in code

A symptom→cause→confirm→fix table for the messaging failure classes, because this is where production bites:

# Symptom Likely cause Confirm (exact path/cmd) Fix
1 Messages in <q>-poison growing Handler throws every time on a bad message Check the poison queue depth in Storage; read a message Make handler tolerant; fix data; reprocess after fix
2 Same record processed twice At-least-once + no idempotency App Insights shows duplicate operation ids Add idempotency key / upsert
3 Service Bus messages re-appear after ~30s Lock expired before processing finished maxAutoLockRenewalDuration too low; long handler Raise lock renewal; shorten work; checkpoint
4 Out-of-order processing No sessions / multiple partitions Events on different partitions/instances Use sessions or partition key for ordering
5 Backlog never drains Scale ceiling hit (partitions, scale limit) Partition count = max instances; functionAppScaleLimit Add partitions; raise/relax the cap
6 Dead-letter on Service Bus filling maxDeliveryCount exceeded DLQ depth in the portal/CLI Inspect DLQ, fix root cause, resubmit
7 Event Grid events lost Endpoint failed validation or 5xx’d Event Grid metrics: delivery failures Return 200 on validation; fix handler; check DLQ blob

Networking and identity for production functions

Demos run on default networking and connection strings; production needs private outbound and passwordless identity. On Flex Consumption, Premium, Dedicated and Container Apps you can VNet-integrate the function app so its outbound traffic flows through your virtual network, then reach databases and PaaS via private endpoints — keeping traffic off the public internet. (Plain Consumption cannot VNet-integrate — a frequent reason to choose Flex.) For identity, give the app a managed identity and use identity-based connections for triggers/bindings (<Name>__serviceUri + an RBAC role) instead of connection strings, and Key Vault references for any remaining secrets.

The networking/identity options and what each requires:

Capability Mechanism Plans that support it Why
Private outbound to VNet VNet integration Flex, Premium, Dedicated, Container Apps Reach private DB/PaaS; egress control
Private inbound Private endpoint on the app Premium, Dedicated, Flex (evolving) No public ingress
Reach PaaS privately Private endpoint on target + DNS Any VNet-integrated plan Storage/SQL/Cosmos off the internet
Passwordless to PaaS Managed identity + RBAC All No secrets to leak/rotate
Identity-based trigger/binding conn <Name>__serviceUri + role All (binding-dependent) Remove connection strings
Secrets when unavoidable Key Vault reference All Secret out of app settings plaintext
Restrict who can call HTTP Access restrictions / Easy Auth / APIM All Lock the endpoint

Wire identity-based access end to end — give the app an identity and grant it queue + blob roles, no keys:

# 1) System-assigned identity
az functionapp identity assign -g $RG -n $APP
PID=$(az functionapp identity show -g $RG -n $APP --query principalId -o tsv)

# 2) Grant it data-plane roles on the storage account (queues + blobs)
SID=$(az storage account show -n $STG -g $RG --query id -o tsv)
az role assignment create --assignee $PID --role "Storage Queue Data Contributor" --scope $SID
az role assignment create --assignee $PID --role "Storage Blob Data Owner"       --scope $SID

# 3) Point the trigger/binding at the account by URI (identity-based), not a key
az functionapp config appsettings set -g $RG -n $APP --settings \
  "Orders__queueServiceUri=https://$STG.queue.core.windows.net/" \
  "Orders__credential=managedidentity"
// Reference a Key Vault secret from an app setting (the app's MI must have 'Key Vault Secrets User')
appSettings: [
  {
    name: 'PaymentApiKey'
    value: '@Microsoft.KeyVault(SecretUri=https://kv-shop.vault.azure.net/secrets/payment-key/)'
  }
]

The identity roles a function commonly needs, by what it touches:

The function… Needs role On
Reads/writes Storage queues Storage Queue Data Contributor The storage account
Reads/writes blobs Storage Blob Data Contributor/Owner The storage account
Reads/writes Cosmos DB Cosmos DB Built-in Data Contributor The Cosmos account
Reads Key Vault secrets Key Vault Secrets User The key vault
Sends to Service Bus Azure Service Bus Data Sender The namespace/queue
Receives from Service Bus Azure Service Bus Data Receiver The namespace/queue
Sends to Event Hubs Azure Event Hubs Data Sender The namespace/hub

Limits and the error reference

Keep this open. First, the platform limits that shape design decisions:

Limit Consumption Flex / Premium Note
Max execution time 5 min default, 10 min hard Long / unbounded (EP default 30 min) The classic reason to leave Consumption
Max instances ~200 High (configurable) Per-app scale cap available
Memory per instance ~1.5 GB Choose (e.g. 512 MB–4 GB+) Flex/Premium let you size it
HTTP response timeout ~230 s (front end) ~230 s Return 202 + async for long work
Queue message size 64 KB 64 KB Pointer pattern for larger
Service Bus message 256 KB (Std) / 1 MB (Prem) same Large-message support on Premium
App settings size ~32 KB total same Don’t stuff payloads into settings
Storage dependency Required Required Host won’t start without it

The host/runtime errors you’ll actually see, and what each means:

Error / symptom Meaning Likely cause First fix
“Azure Functions runtime is unreachable” Host can’t start AzureWebJobsStorage broken (key rotated, firewall, deleted) Fix the storage connection/identity/firewall
HTTP 503 on the function URL No healthy host/instance Host crash-looping; cold start mid-deploy Check App Insights traces; redeploy
HTTP 429 from your function Throttled Daily quota (Consumption) or downstream throttling Check functionAppScaleLimit/quota; back off
Function timeout (504-like) Exceeded functionTimeout Long work on Consumption (10 min cap) Move to Premium/Flex or use Durable async
Binding error at load Function not indexed Missing connection setting / bad binding Set the connection app setting; fix binding
Messages stuck, none processed Trigger not firing Host down, or storage/lease unreachable Check host status + storage health
Duplicate executions At-least-once + retries Crash mid-process; lock expiry Add idempotency
“Did not find functions with language…” Wrong runtime/worker FUNCTIONS_WORKER_RUNTIME mismatch Match runtime to your code
Cold start spikes Fresh instance latency Scale-out / scale-from-zero alwaysReady/pre-warmed; smaller package
Durable orchestration stuck Replay/determinism or task-hub issue Non-deterministic orchestrator; storage throttled Fix determinism; check task-hub storage

The critical app settings for a function app, beyond the bindings:

Setting Controls Typical value Note
AzureWebJobsStorage Backing storage connection (account/identity) Required; identity-based preferred
FUNCTIONS_EXTENSION_VERSION Runtime major version ~4 Pin to a supported major
FUNCTIONS_WORKER_RUNTIME Language worker dotnet-isolated/node/python Must match your code
WEBSITE_RUN_FROM_PACKAGE Run from immutable package 1 Atomic deploys, faster cold start
APPLICATIONINSIGHTS_CONNECTION_STRING Telemetry target (connection string) Always set in prod
functionTimeout (host.json) Per-function max duration plan-dependent Bounded by plan’s hard cap
WEBSITE_CONTENTAZUREFILECONNECTIONSTRING Content share (some plans) (account) Keep consistent with storage
functionAppScaleLimit Max instances none/number Protect downstreams

Architecture at a glance

The diagram traces a real serverless order pipeline left to right, and marks the five hops where things break. Producers — a public client over HTTPS and an upstream system dropping messages — enter on the left. The ingress/trigger zone holds the two front doors: an HTTP-triggered function (behind Application Gateway/Easy Auth, scaling on request rate) and a Service Bus queue that buffers order messages and absorbs spikes so the back end never has to. From there the compute zone is the heart: the scale controller decides how many function instances run (zero to the cap), and a Durable orchestrator coordinates the multi-step workflow — fanning out line-item activities in parallel and waiting for an approval — checkpointing its state to the task hub. The state/dependencies zone is everything the functions read and write through bindings and identity: the backing storage account (runtime state + task hub), Cosmos DB for orders, Key Vault for the payment secret, all reached privately. Finally the observability plane (Application Insights) sees every invocation, dependency call and failure across the whole path.

Read the numbered badges as the failure map. Badge 1 sits on the trigger: a cold start on a fresh instance makes the first call slow — confirm with App Insights request duration after a gap, fix with alwaysReady/pre-warmed instances. Badge 2 is on the queue→instance hop: at-least-once delivery means duplicate processing — confirm with duplicate operation ids, fix with idempotency. Badge 3 is the scale ceiling: a backlog that won’t drain because partitions or the scale limit cap concurrency — confirm by comparing partition count to instance count, fix by adding partitions or raising the cap. Badge 4 is the Durable orchestrator: a non-deterministic orchestrator stalls or misbehaves on replay — confirm with the orchestration history, fix by removing clocks/random/I/O. Badge 5 is the backing storage: if AzureWebJobsStorage is throttled or unreachable, the whole host won’t start — confirm with “runtime unreachable,” fix the storage connection/firewall/identity. The lesson the picture teaches: the function code is the small part; the event source, the scale ceiling, the state store and the delivery guarantee are where serverless systems actually live or die.

Azure Functions serverless order-pipeline architecture: producers (HTTPS client and upstream system) enter an ingress zone with an HTTP-triggered function and a Service Bus queue; a compute zone where the scale controller drives function instances and a Durable orchestrator fans out activities and checkpoints to a task hub; a state and dependencies zone with the backing storage account, Cosmos DB and Key Vault reached privately; and an Application Insights observability plane spanning the path. Five numbered failure badges mark cold start on the trigger, at-least-once duplicate delivery on the queue hop, the partition/scale-limit ceiling that caps backlog drain, a non-deterministic Durable orchestrator that stalls on replay, and a broken backing storage account that stops the host from starting.

Real-world scenario

Saffron Mart, a mid-size Indian grocery e-commerce company, ran its order pipeline on a pair of always-on App Service instances and a couple of VMs for background jobs. Order processing — validate, reserve stock, charge, generate invoice, notify — was a synchronous chain inside the web app, so a slow payment provider made checkout itself slow, and the nightly invoice batch needed its own VM that sat idle 23 hours a day. Monthly spend on this machinery was about ₹62,000, and during festival sales the synchronous chain buckled: checkout p95 climbed past 8 seconds and stock oversold because two requests reserved the same item.

The platform team (three engineers) moved the pipeline to Azure Functions on Flex Consumption. Checkout became a thin HTTP-triggered function that did one thing — validate the cart and drop an order message on a Service Bus queue — then returned 202 Accepted with a status URL. A Durable Functions orchestrator, started from the queue, ran the real workflow: a fan-out to reserve each line item in parallel, then an activity to charge (idempotent, keyed on the order id, against a payment provider that supports idempotency keys), then invoice generation, then notification. The nightly invoice job became a timer-triggered function — no VM. Everything authenticated with a managed identity: the queue, Cosmos DB and Key Vault (for the payment key) were all reached without a single connection string, and Cosmos was behind a private endpoint.

The first festival sale on the new system exposed three lessons. First, cold starts: at the very start of the flash sale, the first wave of checkout calls saw 4–6 second latencies because the app had scaled to zero overnight and the burst hit cold instances. They set Flex alwaysReady=2 on the HTTP group and the cold spikes vanished. Second, duplicate charges: an early bug — a non-idempotent charge activity — meant a redelivered message double-charged a handful of customers during a transient Service Bus lock expiry. The fix was a dedup store keyed on the order id, checked before charging; the lock-renewal duration was also raised because the charge call occasionally took longer than the default lock. Third, backlog: at peak the order queue briefly grew to ~40,000 messages and drained slower than expected — the team had capped functionAppScaleLimit too conservatively at 20 while protecting Cosmos; raising it to 60 and bumping Cosmos throughput cleared it in under two minutes.

The outcome: checkout p95 dropped from 8 s to 310 ms (because checkout no longer waited for the workflow), stock oversell went to zero (line-item reservation became an idempotent, ordered-per-item operation via session-keyed messaging), the nightly VM was deleted, and the monthly bill fell to about ₹28,000 — the serverless pipeline cost nothing at 3am and scaled itself during the sale. The architecture lesson on the team wall: “Make checkout drop a message and walk away. The workflow is Durable’s problem, the scale is the platform’s problem, and at-least-once is your problem — so be idempotent.”

The migration as a before/after, because the shape of the change is the lesson:

Concern Before (App Service + VMs) After (Functions + Durable) Effect
Checkout latency (p95) ~8 s (synchronous chain) 310 ms (drop message, return 202) 25× faster perceived
Order workflow Inline, blocking Durable orchestration (fan-out + approval) Resilient, checkpointed
Nightly invoices Dedicated VM, idle 23h Timer-triggered function VM deleted
Stock oversell at peak Race on shared item Idempotent, session-ordered reserve Zero oversell
Secrets Connection strings in config Managed identity + KV references No secrets to leak
Cold start at sale start n/a (always-on, expensive) Killed with Flex alwaysReady=2 No first-wave spikes
Monthly cost ~₹62,000 ~₹28,000 ~55% lower

Advantages and disadvantages

The event-driven, pay-per-execution model both enables the wins above and introduces a class of problems you don’t have with always-on compute. Weigh it honestly:

Advantages (why serverless helps) Disadvantages (why it bites)
Scale-to-zero: pay nothing at idle; ideal for spiky/low traffic Cold starts add first-request latency on fresh instances
Automatic scale to the event rate — no autoscale rules to write The platform decides scale; bursty ramps and ceilings can surprise you
Bindings remove client boilerplate for dozens of services Bindings hide details; complex needs (Tx, streaming) still need the SDK
Durable Functions gives stateful workflows without a workflow engine Orchestrator determinism rules are subtle; non-deterministic bugs are nasty
No servers to patch, scale or load-balance You can’t ssh to “the server”; you debug through logs/App Insights
Per-execution billing tracks real usage closely At-least-once delivery forces idempotency on you (duplicate executions)
Tight integration with the Azure event ecosystem Vendor lock-in: triggers/bindings/Durable are Azure-specific
Strong fit for glue, automation, event processing, schedules Wrong for long-running compute, ultra-low-latency APIs, persistent local state

Where each matters: serverless is right when work is event-shaped and intermittent, when you want to ship logic not operate hosts, and when occasional cold starts are tolerable (or killable with warm pools). It’s wrong for steady high-CPU compute (you’d pay more than a reserved VM and fight timeouts), for APIs with a hard sub-100 ms p99 and zero cold-start tolerance (use Premium-warmed or a different model), and for anything needing durable local disk or in-memory state across calls. The disadvantages are all manageable — but only if you design for them up front, which is the entire point of this article.

Hands-on lab

Build a queue-triggered, idempotent function on the free-friendly Consumption plan, watch it process and poison a bad message, then tear it down. Run in Cloud Shell (Bash); the runtime + a small storage account stay inside or near the free tier.

Step 1 — Variables and resource group.

RG=rg-fn-lab
LOC=centralindia
STG=stfnlab$RANDOM        # globally-unique, 3-24 lowercase
APP=fn-lab-$RANDOM        # globally-unique
az group create -n $RG -l $LOC -o table

Step 2 — Storage account + Consumption function app (.NET isolated).

az storage account create -n $STG -g $RG -l $LOC --sku Standard_LRS -o table
az functionapp create -n $APP -g $RG \
  --consumption-plan-location $LOC \
  --runtime dotnet-isolated --runtime-version 8.0 \
  --functions-version 4 \
  --storage-account $STG -o table

Expected: a function app row, state = Running.

Step 3 — Create the work queue and the (auto-created) poison queue.

KEY=$(az storage account keys list -n $STG -g $RG --query "[0].value" -o tsv)
az storage queue create -n orders-in --account-name $STG --account-key "$KEY" -o table
# The 'orders-in-poison' queue is created automatically on first poison event.

Step 4 — Deploy a queue-triggered function. (Author locally with func init/func new and func azure functionapp publish $APP, or deploy a zip.) The handler is idempotent — it records processed ids in Table storage and skips duplicates, and it throws on a deliberately bad payload so you can watch poisoning:

[Function("ProcessOrder")]
public async Task Run([QueueTrigger("orders-in")] string body)
{
    var msg = JsonSerializer.Deserialize<OrderMsg>(body)
              ?? throw new InvalidOperationException("bad payload"); // -> retried -> poison
    if (await AlreadyProcessed(msg.Id)) return;                     // idempotent skip
    await DoWork(msg);
    await MarkProcessed(msg.Id);
}

Step 5 — Send a good message and watch it process.

GOOD='{"Id":"o-1001","Item":"rice-5kg"}'
az storage message put -q orders-in --content "$GOOD" \
  --account-name $STG --account-key "$KEY" -o table
# Stream logs and watch the invocation succeed:
az webapp log tail -n $APP -g $RG

Expected: one successful invocation in the log; the message disappears from orders-in.

Step 6 — Send a bad message and watch it poison.

az storage message put -q orders-in --content 'not-json' \
  --account-name $STG --account-key "$KEY" -o table
# After ~5 dequeue attempts it lands in orders-in-poison:
sleep 5
az storage message peek -q orders-in-poison --account-name $STG --account-key "$KEY" -o table

Expected: after the retries, the bad message appears in orders-in-poison — proof that one bad message doesn’t block the queue forever.

Step 7 — Confirm idempotency. Re-send the same good id (o-1001); the handler runs but the AlreadyProcessed check skips the work — no duplicate side effect. Verify in the log: the invocation completes without doing work twice.

Step 8 — Teardown.

az group delete -n $RG --yes --no-wait

You’ve now seen the three things that define serverless event processing in production: it scales to the queue, it sets bad messages aside instead of looping, and it stays correct under duplicate delivery because you made it idempotent.

Common mistakes & troubleshooting

The same dozen mistakes account for most Functions incidents. Each is symptom → root cause → confirm (exact path/command) → fix.

1. “Azure Functions runtime is unreachable” — the whole app is down. Root cause: The backing storage account (AzureWebJobsStorage) is broken — its access key rotated without updating the setting, a firewall now blocks the app, or the account/container was deleted. Confirm: Portal banner on the function app; az functionapp config appsettings list -n $APP -g $RG --query "[?name=='AzureWebJobsStorage']"; check the storage account’s networking/firewall and that the key matches. Fix: Repair the connection (new key or, better, switch to identity-based AzureWebJobsStorage__accountName + role); allow the app through the storage firewall; never let the host’s storage be unreachable.

2. Function runs twice (or N times) for one event. Root cause: At-least-once delivery plus a crash/lock-expiry mid-process; the message is redelivered. Not a platform bug — expected behaviour. Confirm: App Insights requests/traces show the same operation/message id processed more than once; Service Bus DeliveryCount > 1. Fix: Make the handler idempotent — dedup store keyed on message id, upsert by natural key, or use a downstream that dedups. Never assume exactly-once.

3. Messages pile up in the poison/dead-letter queue. Root cause: The handler throws on certain messages every time (bad data, a downstream that’s down), so they exhaust retries and are set aside — and nobody is draining them. Confirm: Storage <queue>-poison depth, or Service Bus DLQ depth (az servicebus queue show ... --query countDetails.deadLetterMessageCount). Fix: Alert on poison/DLQ depth; read a poisoned message to find the cause; fix the data/downstream; reprocess (move messages back). Make handlers tolerant of expected-bad input rather than throwing.

4. Long-running function times out. Root cause: The work exceeds the plan’s max execution time — 10 minutes hard on Consumption. Confirm: App Insights shows the invocation cut at the timeout; functionTimeout in host.json vs the plan cap. Fix: Move to Premium/Flex (long/unbounded timeout) for genuinely long work, or refactor to Durable Functions (the async pattern: return immediately, run the long workflow in pieces).

5. HTTP function returns 502/503 intermittently. Root cause: Cold start mid-deploy, host crash-looping, or the response exceeded the ~230 s front-end timeout. Confirm: App Insights requests with failures; correlate to deploys/scale events; check function duration against 230 s. Fix: Use alwaysReady/pre-warmed instances; fix the crash (see traces); for long work return 202 + status URL instead of blocking. (Same front-end mechanics as App Service 502/503 troubleshooting.)

6. The app won’t scale out — backlog grows. Root cause: A scale ceiling: Event Hubs/Cosmos partition count caps instances, or functionAppScaleLimit/WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT is set low, or you’re on Dedicated (no elastic scale). Confirm: Compare partition count to instance count (App Insights cloud_RoleInstance cardinality); read the scale-limit settings. Fix: Add partitions (at the source — can’t change later cheaply), raise/remove the scale cap, or move to a plan that scales elastically.

7. Cold-start latency on a “warm” Premium plan. Root cause: Pre-warmed count is 1 (default) and a burst scaled out faster than the buffer covered; or you scaled past maxBurst. Confirm: App Insights shows latency spikes correlated with new cloud_RoleInstance values during a burst. Fix: Raise preWarmedInstanceCount and maxBurst; keep the deployment package small; reuse clients so per-instance warm-up is cheap.

8. Timer fired multiple times / didn’t fire after a restart. Root cause: For multiples — a misconfiguration broke the singleton lock (rare) or you confused it with RunOnStartup firing on every restart. For misses — the host was down during the schedule and you didn’t opt into catch-up. Confirm: Function execution log timestamps; check for RunOnStartup=true; verify the storage lock container. Fix: Remove RunOnStartup in production; rely on the storage-backed singleton; for critical schedules, make the job idempotent and tolerant of a missed/duplicate run.

9. Durable orchestration is stuck or behaves nondeterministically. Root cause: The orchestrator violates determinism (used DateTime.Now, Guid.NewGuid(), direct I/O, or awaited a non-durable task), so replay diverges from history; or the task-hub storage is throttled. Confirm: Query the orchestration status/history (az rest to the Durable status endpoint, or the Durable Functions monitor); look for replay errors; check the storage account metrics for throttling. Fix: Remove all non-deterministic calls from the orchestrator (move I/O to activities, use ctx.CurrentUtcDateTime/ctx.NewGuid()); if storage is the bottleneck, scale it or switch the Durable backend (Netherite/MSSQL).

10. Binding resolves to null / function isn’t found. Root cause: A binding expression name doesn’t match the trigger property (case-sensitive), or a required connection app setting is missing, so the function fails to index. Confirm: Startup logs show “no functions found” or an indexing error; the binding parameter is null at runtime. Fix: Match {property} exactly to the trigger’s field; set the binding’s connection app setting (<Name>__serviceUri or connection string); redeploy and re-check the function list.

11. 403 / auth failures calling a dependency (DB, Key Vault, Storage). Root cause: The managed identity isn’t enabled, or lacks the data-plane RBAC role on the target, or the target’s firewall blocks the app’s outbound. Confirm: az functionapp identity show; az role assignment list --assignee <principalId> --scope <targetId>; the target’s networking blade. Fix: Assign the identity; grant the data role (e.g. Storage Blob Data Contributor, Key Vault Secrets User), not just control-plane Reader; allow the app’s subnet/outbound through the target firewall (private endpoint preferred).

12. Costs higher than expected on Consumption. Root cause: A chatty trigger (a queue that’s never empty, an aggressively-polling timer) or a function that runs far more often/longer than assumed; or a runaway retry loop reprocessing poison messages. Confirm: App Insights execution count × duration; the cost analysis blade filtered to the function app; check poison-queue churn. Fix: Reduce invocation frequency (batch, raise polling interval), shorten execution, fix retry loops, and consider Premium if steady load makes per-execution pricing lose to a flat plan.

Best practices

Security notes

The security controls and what each prevents:

Control Mechanism Secures against Also prevents
Managed identity + RBAC identity + data role Secrets in plaintext settings Rotation breaking the app
Key Vault references @Microsoft.KeyVault(...) Secret values in config Hand-rolled secret handling
Easy Auth / APIM in front Entra ID / APIM policy Anonymous abuse of HTTP funcs Key-only “auth” being bypassed
VNet integration + PE Private outbound/inbound Public-internet exposure of deps Data exfil over public paths
Storage hardening Firewall + identity, no shared key Tampering with runtime/task hub Host-takeover via storage
Run-from-package + scanning Immutable, scanned artifact Tampered/unknown code Surprise breaking deploys

Cost & sizing

What drives the bill, by plan:

The cost levers and what each buys:

Cost driver What you pay for Rough INR/month (illustrative) When it dominates
Consumption executions + GB-s Per-run + memory×time (free grant first) ₹0–3,000 for spiky/low traffic Bursty, low-to-moderate volume
Flex alwaysReady instances Warm pool (per instance) ~₹3,000–6,000 per warm instance Killing cold start on hot paths
Premium EP1 (1 instance) Always-on vCPU/GB ~₹12,000–18,000 Steady load, warm + VNet + long runs
Dedicated (shared plan) Plan instance-hours marginal if plan exists Co-located with web apps
Backing storage Transactions + capacity ~₹200–1,500 High trigger/Durable churn
App Insights ingestion Per-GB telemetry ~₹1,000–3,000 High-volume tracing (sample it)

Sizing guidance: start on Consumption/Flex and measure; if your monthly execution × duration cost approaches the price of an EP1, or you keep fighting cold starts/VNet, move to Premium. Keep functions short (duration is half the GB-s bill), batch where it cuts invocation count, and enable Application Insights adaptive sampling so a traffic spike doesn’t spike the telemetry bill. The biggest hidden cost is a retry/poison loop silently reprocessing bad messages forever — alert on poison depth so it never runs up the meter. For the broader cost-control workflow, see Azure FinOps & Cost Management at Scale.

Interview & exam questions

1. What is the difference between a trigger and a binding? A trigger is the single event that starts a function and supplies its payload (and is the scaling signal); a binding is a declarative input or output connection to a service. Every function has exactly one trigger and zero or more input/output bindings; the trigger is technically a special binding with direction trigger.

2. When would you choose Flex Consumption over Consumption? When you need scale-to-zero and features Consumption lacks — chiefly VNet integration (private outbound to databases/PaaS), alwaysReady warm instances to eliminate cold start, per-instance memory sizing, and per-instance concurrency control. Flex is the modern serverless default; plain Consumption is for the simplest spiky glue.

3. Why might a function execute the same message twice, and how do you make that safe? Queue/Service Bus/Event Hubs/Event Grid triggers deliver at least once; a crash or lock/visibility expiry mid-processing causes redelivery. You make it safe with idempotency — a dedup store keyed on the message id, an upsert by natural key, a conditional (ETag) write, or an idempotent downstream — so processing twice has the same effect as once.

4. What is a cold start and which plans eliminate it? Cold start is the latency the first request on a freshly created instance pays (sandbox allocation, worker start, code load, connection priming). Premium eliminates it with pre-warmed instances; Flex Consumption removes it for the hot path with alwaysReady instances; Dedicated stays warm (Always On). Plain Consumption cannot fully avoid it.

5. Why must Durable orchestrator functions be deterministic? The platform checkpoints an orchestrator’s progress and replays the function from the start to rebuild state after an await or restart. If the code uses non-deterministic operations (DateTime.Now, Guid.NewGuid(), direct I/O, non-durable awaits), replay diverges from the recorded history and the orchestration breaks. Use ctx.CurrentUtcDateTime, ctx.NewGuid(), and put all side effects in activity functions.

6. Describe the fan-out/fan-in pattern in Durable Functions. The orchestrator starts many activity functions in parallel (e.g. one per item), collecting their tasks, then awaits all of them (Task.WhenAll) and aggregates the results. It’s the pattern for parallelizing independent work and then reconciling — far simpler than hand-rolling parallel queue workers plus a join.

7. What is the maximum execution time on the Consumption plan, and what do you do about a longer job? 10 minutes (5-minute default, 10-minute hard cap). For longer work, move to Premium/Flex (long or unbounded timeout) or refactor to Durable Functions using the async HTTP pattern — return 202 immediately and run the long workflow as checkpointed orchestrator/activity steps that aren’t bound by a single function’s timeout.

8. How does the scale controller decide how many instances to run, and what caps it? It watches each trigger’s scaling signal — HTTP request rate, queue length, Event Hubs/Cosmos partition lag — and adds/removes instances in steps from zero to the plan max. The cap is the plan’s maximum plus trigger-specific ceilings: Event Hubs/Cosmos give one instance per partition, and you can set functionAppScaleLimit/WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT to bound it.

9. What happens to a message that keeps failing, on Storage queues vs Service Bus? On Storage queues, after maxDequeueCount (default 5) the message is moved to a <queue>-poison queue. On Service Bus, after maxDeliveryCount (default 10) it goes to the built-in dead-letter sub-queue. In both cases you must monitor and drain these or failures accumulate silently.

10. Why is the backing storage account so important to a function app? It holds runtime metadata, trigger leases/checkpoints, the Durable task hub, and (some plans) the deployment package via the AzureWebJobsStorage connection. If it’s unreachable, throttled, or its key rotates without updating the setting, the host fails to start (“runtime unreachable”) — a large share of “Functions is down” is really a storage problem.

11. How do you give a function passwordless access to Cosmos DB or Key Vault? Enable a managed identity on the function app and grant it the data-plane RBAC role on the target (e.g. Cosmos DB Built-in Data Contributor, Key Vault Secrets User), then use an identity-based connection (<Name>__serviceUri + __credential=managedidentity) or a Key Vault reference — no connection strings or keys in app settings.

12. When is Azure Functions the wrong choice? For long-running, steady high-CPU compute (you’d pay more than a reserved VM and fight timeouts), ultra-low-latency APIs with a hard sub-100 ms p99 and zero cold-start tolerance, and workloads needing persistent local state or disk across invocations. Those fit App Service, AKS, Container Apps with min replicas, or VMs better.

These map to AZ-204 (Developer Associate)implement Azure Functions (triggers, bindings, Durable Functions) and develop event-based and message-based solutions; AZ-104 touches the hosting/scaling/monitoring angle; and the networking/identity content (VNet integration, managed identity, private endpoints) reaches AZ-500/AZ-700. A compact cert mapping:

Question theme Primary cert Objective area
Triggers, bindings, Durable patterns AZ-204 Implement Azure Functions; event/message solutions
Plans, scaling, cold start AZ-204 / AZ-104 Implement & configure compute
Idempotency, poison/dead-letter AZ-204 Message-based solutions
Managed identity, Key Vault refs AZ-204 / AZ-500 Secure solutions; manage identity
VNet integration, private endpoints AZ-700 Design & implement network connectivity
Monitoring with App Insights AZ-204 Instrument, monitor & troubleshoot

Quick check

  1. Your HTTP-triggered function needs to reach a private Azure SQL database and you want scale-to-zero. Which plan, and why not plain Consumption?
  2. A queue-triggered function occasionally charges a customer twice. What property of the trigger explains this, and what’s the fix?
  3. True or false: adding more instances will fix an Event Hubs trigger that can’t keep up with its backlog.
  4. Your Durable orchestrator works on first run but behaves erratically after the host restarts mid-workflow. Name two things in the orchestrator code to check.
  5. The function app shows “Azure Functions runtime is unreachable” and nothing runs. What’s the most likely root cause?

Answers

  1. Flex Consumption — it offers scale-to-zero and VNet integration (so it can reach the private SQL endpoint), plus alwaysReady to kill cold start. Plain Consumption can’t VNet-integrate, so it can’t reach the private database.
  2. The trigger delivers at least once; a crash or lock/visibility expiry mid-process causes redelivery, so the same message is processed twice. Fix with idempotency — e.g. a dedup store keyed on the order id checked before charging, or an idempotency key on the payment call.
  3. False. Event Hubs scales to at most one instance per partition, so the partition count is the ceiling regardless of instance settings. Add partitions (at the source) or process larger batches faster; more instances alone won’t help.
  4. Check that the orchestrator (a) uses ctx.CurrentUtcDateTime/ctx.NewGuid() instead of DateTime.Now/Guid.NewGuid(), and (b) performs no direct I/O and only awaits durable APIs (all side effects moved into activity functions). Non-determinism breaks replay.
  5. The backing storage account (AzureWebJobsStorage) is broken or unreachable — a rotated access key not updated in the setting, a firewall now blocking the app, or a deleted account/container. Repair the connection (prefer identity-based) and the host starts.

Glossary

Next steps

You can now choose a plan, wire triggers and bindings, reason about scale and cold starts, orchestrate with Durable Functions, and make handlers idempotent and observable. Build outward:

AzureAzure FunctionsServerlessEvent-DrivenDurable FunctionsTriggers & BindingsFlex ConsumptionScaling
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading