AWS Lesson 13 of 123

Optimizing AWS Lambda Performance: Cold Starts, Provisioned Concurrency, SnapStart, and Memory Tuning

“Lambda is slow” is almost never true. What is true is that an under-tuned function pays for a cold start it could have priced away, runs on a fraction of a vCPU because someone set 128 MB and forgot, and opens a fresh database connection on every invocation because the handler does its work in the wrong scope. Latency on AWS Lambda — the event-driven compute service that runs your code in ephemeral, auto-scaled execution environments — is a tuning problem, not a platform limit. The platform gives you a precise set of levers; the skill is reaching for them in the right order and proving each one moved the number you care about.

This guide walks those levers in order of leverage: understand the cold start, tune memory (which is also CPU), then decide whether provisioned concurrency or SnapStart is justified, fix connection reuse, and plan concurrency so a load spike does not turn into a wall of throttles. Because this is a reference you will return to mid-incident — when p99 has blown past your SLA at 18:03 on a flash-sale Friday — the playbook itself, the metrics, the runtime support matrix, the limits and the cost drivers are all laid out as scannable tables. Read the prose once to build the mental model, then keep the tables open when you are actually tuning.

By the end you will stop guessing. You will know whether a slow request is an oversized init phase, a CPU-starved 128 MB function, a spillover cold start past your provisioned pool, a SnapStart restore that cloned a UUID across every environment, a connection blowup against max_connections, or a throttle wall you could have seen coming in the Throttles metric. Knowing which — from the REPORT line and three CloudWatch metrics — is what separates a five-minute tune from a two-hour stare at the wrong dashboard.

What problem this solves

Serverless promises you ship a handler and forget the fleet. That abstraction is a gift until latency matters, and then the very thing that makes Lambda elastic — spinning a fresh execution environment on demand — becomes the thing your p99 trips over. A synchronous API behind API Gateway has a hard ceiling (29 seconds at the gateway), a card network might hold you to a contractual p99 under 800 ms, and a flash sale can ask for 10x your steady-state concurrency in ninety seconds. Under that pressure the defaults betray you: 128 MB of memory means a sliver of a vCPU, no provisioned warmth means every scale-out instance cold-starts, and a handler that opens a connection per call melts your database long before Lambda itself complains.

What breaks without this knowledge is predictable and expensive. An engineer “fixes” a latency spike by nailing provisioned concurrency to a flat 300 around the clock — paying for 300 warm JVMs at 3 AM for a daytime workload — and the finance team notices. Another sets 128 MB to “save money” and the CPU-bound function runs 14x slower at the same GB-second cost, so the bill is identical and the latency is terrible. A third moves heavy init into the handler, so every warm invocation re-opens a connection, and at 500 concurrent environments the relational database hits max_connections and starts refusing everyone. None of these are platform limits. Every one is a tuning decision made blind.

Who hits this: anyone running Lambda on a latency-sensitive or high-concurrency path. It bites hardest on JVM and .NET functions (heavy init, JIT/class-load tax on cold start), synchronous APIs with a strict tail-latency SLO, relational-database-backed functions at scale (connection exhaustion), and spiky workloads (throttles and burst-limit walls). The fix is almost never “use more memory blindly” or “buy warmth everywhere” — it is measure the init, right-size first because it is free, then buy down only the cold start that remains on the paths whose SLO actually requires it.

To frame the whole field before the deep dive, here is every latency symptom this guide addresses, the question it forces, and the first lever to reach for:

Symptom What is actually happening First question to ask First place to look First lever
Slow first request after idle Fresh environment paying init Is this cold (has Init Duration) or warm? REPORT line in CloudWatch Logs Trim init; consider PC/SnapStart
Function feels slow, no SLO pressure Under-provisioned CPU at low memory Is it CPU-bound at the current memory? Lambda Power Tuning sweep Right-size memory (free)
High p99 on a synchronous API Cold starts on the critical path Is p99 driven by Init Duration? X-Ray init subsegment; Logs Insights Provisioned concurrency on the alias
JVM/.NET cold starts dominate, cost-sensitive Class-load + JIT tax per cold env Is the runtime SnapStart-eligible here? Runtime + region support matrix SnapStart with priming hooks
DB connection errors at scale New connection per invocation Are connections opened in the handler? RDS connection count vs cap Init-scope reuse + RDS Proxy
429s / requests dropped under spike Concurrency limit / burst ceiling hit Reserved cap, account cap, or burst? Throttles + ConcurrentExecutions Reserve to partition; raise quota

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should already be comfortable authoring and deploying a Lambda function (a handler, a deployment package or container image, an execution role), reading JSON, and running the aws CLI. You should understand what an alias and a version are, that API Gateway can front a function synchronously, and the basics of how Lambda scales (one environment serves one request at a time; concurrency is the count of in-flight executions). Familiarity with a VPC, security groups, and a relational database connection pool helps for the RDS Proxy material.

This sits in the performance and cost-optimization layer of the serverless track. The mechanics underneath it — runtimes, triggers, layers, the full concurrency model — are covered in AWS Lambda, In Depth: Runtimes, Triggers, Layers, Concurrency & Every Setting, which is upstream of this article. The synchronous entry path and its 29-second ceiling come from Amazon API Gateway, In Depth: REST vs HTTP vs WebSocket APIs, Integrations & Authorizers. The connection-pooling fix is a deep topic in its own right — see RDS Proxy in Production: Connection Pooling, Failover Acceleration, and IAM Authentication. And the measurement layer that proves every lever worked lives in AWS Observability, In Depth: CloudWatch, CloudTrail, Config & EventBridge and Distributed Tracing on AWS with X-Ray: Service Maps, Segments, and ADOT on EKS.

A quick map of who owns which lever, so you pull the right one and call the right person:

Layer What lives here Who usually owns it Latency failure it can cause
Client / API Gateway TLS, request routing, 29 s timeout Frontend / API team Timeout if function exceeds 29 s; retries amplify load
Function config Memory, timeout, alias, PC, SnapStart App / platform team Slow CPU at low memory; cold starts; spillover
Function code (init scope) Imports, SDK clients, connections App / dev team Oversized init; per-invocation connection blowup
Concurrency controls Reserved, provisioned, account quota Platform / SRE Throttles (429); burst-ceiling wall
Downstream (RDS / DynamoDB) Connection pool, capacity Data team max_connections exhaustion; dependency latency
Observability Logs, metrics, traces SRE / platform Tuning blind; can’t prove a change worked

Core concepts

Five mental models make every later decision obvious.

A cold start is the work before your handler runs on a fresh environment. When Lambda needs a new execution environment it does three things: provisions the microVM and pulls your package or image, runs your init phase (everything outside the handler — imports, SDK clients, static config, connection setup), then runs your handler. The first two parts happen once per environment and are billed; after the first invocation the environment is reused (a warm invocation) until it is recycled. “Cold start” is those first two parts; everything you do to fight latency is either making them cheaper, making them happen ahead of traffic, or avoiding them entirely.

Memory is CPU. Lambda allocates CPU proportionally to the memory you configure. At 1,769 MB a function gets the equivalent of one full vCPU; below that you get a fraction, above it more than one (up to ~6 vCPUs at 10,240 MB). A CPU-bound function at 128 MB is not “cheap” — it runs roughly 14x slower than at 1,769 MB, and because Lambda bills GB-seconds (memory × duration), the slower run can cost the same or more while delivering far worse latency. This is the single highest-leverage knob and the most misunderstood.

Warmth is something you buy, two ways. Provisioned concurrency keeps a pool of environments fully initialized and ready, so the init phase has already happened before traffic arrives — you pay for that warmth continuously, whether or not it is invoked. SnapStart instead runs init once at publish time, snapshots the initialized microVM, and restores from the snapshot on cold start — no idle charge, but you inherit snapshot caveats (cloned uniqueness, stale connections, JIT priming). They solve the same problem with opposite cost shapes.

Concurrency is finite and shared. Concurrency is the number of in-flight executions. Your account has a regional concurrency limit (1,000 by default, raisable). Reserved concurrency carves a guaranteed-and-capped slice out of that pool for one function; provisioned concurrency is a pre-warmed subset of reserved. There is also a burst ceiling governing how fast you can scale from cold. Exceed any of these and Lambda throttles — and how that throttle surfaces depends on how the function was invoked.

Connections must live in init scope. Anything expensive to create — a database connection, an HTTP client, a secret fetch — belongs outside the handler, in init scope, so it is created once per environment and reused by every warm invocation. Put it inside the handler and it runs on every call, adding latency and, at scale, exhausting downstream connection limits. This single discipline prevents most self-inflicted Lambda latency.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary repeats these for lookup; this is the model side by side:

Concept One-line definition Where it lives Why it matters to latency
Execution environment The microVM that runs one invocation at a time Lambda-managed Cold when new; warm when reused
Init phase Code outside the handler, run once per env Your code (module scope) Dominates cold-start cost you control
Cold start Env provision + init before first invoke Lifecycle The latency you are fighting
Warm invoke Handler on a reused environment Lifecycle The fast path; skips init
Memory (MB) Configured RAM; also sets CPU share Function config (128–10240) More memory = more CPU = faster
Provisioned concurrency Pre-initialized, always-ready pool On a version/alias Removes cold start; idle cost
SnapStart Snapshot-restore instead of re-init On published versions Removes most init; no idle cost
Reserved concurrency Guaranteed + capped slice of the account Per function Partitions; protects downstreams
Throttle Invocation rejected at a concurrency limit Runtime behaviour 429 sync; retries async
RDS Proxy Pools/multiplexes DB connections In front of RDS/Aurora Stops connection blowup at scale
GB-second Billing unit: memory × duration Billing Why slow-and-small ≠ cheap

Anatomy of a cold start

A cold start has three measurable parts, and only one of them is fully yours to control. Knowing which part dominates tells you which lever to pull.

Part What happens Billed? Who controls it How to reduce
Download / env init Provision microVM, pull package/image, start runtime No (platform) Mostly AWS; you affect package/image size Smaller artifact; zip over large container; fewer layers
Init phase (your code) Imports, SDK clients, static config, connections Yes You (module scope) Trim deps, lazy-init, mark SDK external
Invoke (warm path) Your handler body Yes You Right-size memory; efficient code

The init phase is where you have the most leverage, and two things dominate it: package size and what your code does at import time. A 250 MB unzipped bundle that eagerly constructs a dozen SDK clients and reads SSM parameters synchronously will have an init phase measured in seconds. Trim both.

# What is actually in the bundle? Init time tracks closely with this.
unzip -l function.zip | tail -1

# For Node, prune dev deps and bundle/tree-shake so only used code ships
npm prune --omit=dev
npx esbuild src/handler.js --bundle --minify --platform=node \
  --target=node20 --external:@aws-sdk/* --outfile=dist/handler.js

The AWS SDK v3 (@aws-sdk/*) and boto3 are already present in the managed runtimes. Marking the SDK --external and not bundling it keeps your artifact small. Pin to a layer only if you need behaviour the runtime’s bundled SDK lacks.

You read the init duration directly from the REPORT line in CloudWatch Logs — Init Duration appears only on cold-start invocations, which makes it a clean signal to filter on. Here is exactly what each REPORT field tells you and how to act on it:

REPORT field What it measures Read it as Action if it’s high
Init Duration Time in your init phase (cold only) Cold-start cost you own Trim package/init; PC or SnapStart
Duration Handler execution time Warm-path latency Right-size memory; profile code
Billed Duration What you pay for (rounded up to 1 ms) The bill driver Lower memory only if not CPU-bound
Max Memory Used Peak memory of the invocation Headroom vs configured Drop memory if far below; raise if near
Memory Size Configured memory Your setting The knob you tune
XRAY TraceId (if active) Trace correlation Where to drill in X-Ray Open the trace for segment breakdown

Two cold-start facts worth internalizing, because they shape everything downstream. First, the init phase runs with a brief CPU boost in unprovisioned environments — AWS gives init extra CPU regardless of your memory setting — which is why a heavy init isn’t quite as slow as the same work mid-handler, but it is still billed and still on the critical path. Second, init has a 10-second soft budget before the platform may retry the initialization; an init that legitimately needs longer is a design smell. The init-cost contributors, ranked:

Init cost Typical magnitude Reduce it by Trade-off
Package / image pull 100 ms – several s (size-dependent) Smaller artifact; zip vs big container; layer hygiene Build discipline
Runtime boot 50 ms – 1 s Lighter runtime; avoid heavy frameworks Framework features lost
SDK client construction 50–500 ms each Construct in init scope once; only what you use Slightly more module code
Synchronous config fetch (SSM/Secrets) 50 ms – seconds Cache; fetch fewer params; batch Less granular config refresh
Framework / DI graph (Spring etc.) 1–10+ s (JVM/.NET) SnapStart + priming; lighter framework Complexity; framework lock-in
First DB connect / pool prime 50 ms – seconds Init scope; pooled driver; RDS Proxy First real request still primes

Memory is CPU: right-size with Lambda Power Tuning

This is the highest-leverage knob and the one most teams get wrong by guessing. Because CPU scales with memory, a CPU-bound function at 128 MB is slow and not actually cheaper — it just runs longer at fewer GB per second. Do not guess. Run AWS Lambda Power Tuning, an open-source Step Functions state machine that invokes your function across a memory sweep and plots cost against speed. (Step Functions itself is covered in AWS Step Functions in Production: Express vs Standard, Distributed Map, and Resilient Error Handling.)

# Deploy the tuner from the Serverless Application Repository
sam deploy \
  --template-file template.yaml \
  --stack-name lambda-power-tuning \
  --capabilities CAPABILITY_IAM \
  --parameter-overrides "PowerValues=128,256,512,1024,1536,1769,3008"
{
  "lambdaARN": "arn:aws:lambda:us-east-1:111122223333:function:order-processor",
  "powerValues": [128, 256, 512, 1024, 1536, 1769, 3008],
  "num": 50,
  "payload": { "orderId": "test-123" },
  "strategy": "balanced"
}

The strategy you pick changes what the tuner optimizes for — choose it for the path’s actual goal, not by reflex:

Strategy Optimizes for Use it when Risk if misused
cost Cheapest acceptable config Batch / async, no latency SLO Picks low memory → slow for users
speed Fastest config Latency-critical synchronous path Overspends on memory you don’t need
balanced Best cost-vs-speed tradeoff Default; most functions May miss a strict p99 target

The single most important rule: tune with a representative payload. A synthetic empty event under-exercises the function and lies about the optimum; a real-shaped payload reveals the true CPU profile. I have repeatedly found that moving a JSON-crunching function from 512 MB to 1024 MB halves duration and lowers cost because the work finishes in less than half the GB-seconds. Memory is also the only knob that changes both axes at once — most levers trade cost for latency, this one can improve both:

Memory Approx vCPU share Best for Cost note
128 MB ~0.07 vCPU (a sliver) Trivial glue, no CPU work “Cheap” only if truly I/O-bound and idle-fast
512 MB ~0.28 vCPU Light transforms Often slower and not cheaper than 1024 for CPU work
1024 MB ~0.58 vCPU Common sweet spot for APIs Frequently the balanced-strategy winner
1769 MB ~1.00 vCPU (full) CPU-bound work; JVM Below this, single-threaded code can’t use a full core
3008 MB ~1.79 vCPU Parallel / heavy compute More cores; watch GB-second cost
10240 MB ~6 vCPU (max) Multi-threaded, compute-heavy Max CPU; only if the code parallelizes

Apply the winner explicitly, and verify it took:

aws lambda update-function-configuration \
  --function-name order-processor --memory-size 1024

aws lambda get-function-configuration \
  --function-name order-processor --query 'MemorySize'
# Terraform — pin the tuned memory as code
resource "aws_lambda_function" "order_processor" {
  function_name = "order-processor"
  role          = aws_iam_role.lambda.arn
  handler       = "handler.handler"
  runtime       = "nodejs20.x"
  memory_size   = 1024   # from Power Tuning, balanced strategy
  timeout       = 10
}

The common right-sizing mistakes, and the REPORT/Power-Tuning evidence that exposes each:

Mistake What you see Evidence Fix
Stuck at 128 MB “to save money” High Duration, same/worse cost CPU-bound; duration drops sharply with memory Move to the Power-Tuning winner
Over-allocated memory Max Memory Used far below Memory Size Logs Insights peakMemMB Drop memory toward peak + headroom
Tuned with empty payload “Optimum” disagrees with prod latency p99 in prod ≠ tuner result Re-tune with a representative event
One size for all functions Some over-, some under-provisioned Per-function peakMemMB spread Tune each function independently

Provisioned concurrency: pre-warmed capacity

If your tuned function still cannot tolerate cold starts on the critical path — a synchronous API behind API Gateway, a checkout flow — provisioned concurrency (PC) keeps a pool of environments initialized and ready, so the init phase has already happened before traffic arrives. It is configured against a version or alias — never $LATEST, which forces a clean deploy-then-shift model.

# Publish an immutable version, then point PC at the alias
aws lambda publish-version --function-name order-processor

aws lambda update-alias \
  --function-name order-processor \
  --name live \
  --function-version 42

aws lambda put-provisioned-concurrency-config \
  --function-name order-processor \
  --qualifier live \
  --provisioned-concurrent-executions 20

Static provisioning wastes money outside peak. Drive it with Application Auto Scaling on a schedule or a utilization target so you pay for warmth only when you need it:

aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --resource-id function:order-processor:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --min-capacity 5 --max-capacity 100

aws application-autoscaling put-scaling-policy \
  --service-namespace lambda \
  --resource-id function:order-processor:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --policy-name pc-utilization \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 0.7,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization"
    }
  }'

Key facts to internalize: you pay for provisioned concurrency for the time it is enabled, whether or not it is invoked, plus a (reduced) per-request and duration charge when it is used. If demand exceeds your provisioned pool, the overflow spills to standard on-demand concurrency and those requests do cold-start. The ProvisionedConcurrencySpilloverInvocations metric is your floor-too-low alarm. Here is every PC configuration knob and how to reason about it:

Setting What it controls Values When to change Gotcha
--qualifier Version/alias PC attaches to a version number or alias Always an alias for deploy-shift $LATEST is rejected — by design
--provisioned-concurrent-executions Size of the warm pool 1 … (≤ reserved/account) Match the warm floor you need Counts against account concurrency
AutoScaling min-capacity Floor PC never drops below ≥ 0 Always-on readiness baseline Too high = idle waste
AutoScaling max-capacity Ceiling for scale-up ≤ account limit Cap the spend / protect downstreams Too low = spillover under spike
TargetValue (utilization) Target PC utilization 0.0–1.0 (e.g. 0.7) Tighter = more headroom, more cost Too high = spill before scale reacts
Scheduled action Time-based floor changes cron/rate Predictable diurnal peaks Time-zone mistakes ramp at the wrong hour

The states a PC config moves through, and what each means for traffic — do not shift traffic until it is READY:

PC Status Meaning Safe to serve? What to do
IN_PROGRESS Environments still initializing No (serves cold meanwhile) Wait; don’t promote the alias yet
READY Pool fully warm and allocated Yes Shift traffic to this alias
FAILED Allocation failed No Check account concurrency / quota; retry
# Don't promote until READY — and confirm the allocated count
aws lambda get-provisioned-concurrency-config \
  --function-name order-processor --qualifier live \
  --query '{status:Status, allocated:AllocatedProvisionedConcurrentExecutions}'

The decision of how much PC to provision is a small table of trade-offs, not a guess:

If your traffic is… Provision… Driven by… Why
Flat, predictable A static floor near steady-state concurrency Fixed PC Simplicity; no spill
Diurnal (business hours) Low floor + scheduled ramp Scheduled AutoScaling Pay for peak only when it exists
Spiky but gradual Utilization target (0.7) Target-tracking Scales with demand; some lead time
Flash-spike (seconds) Higher static floor for the window Pre-scaled scheduled action Target-tracking can’t react in seconds

SnapStart: snapshot-restore instead of re-init

SnapStart attacks cold starts from a different angle. Instead of keeping environments warm and paying for idle capacity, Lambda runs your init once at publish time, takes a Firecracker microVM snapshot of the initialized memory and disk, and restores from that snapshot on cold start instead of re-running init. It carries no provisioned-concurrency idle cost. SnapStart began on Java and AWS has extended it to Python and .NET runtimes; always confirm the runtimes available in your account’s region before committing.

# AWS SAM — enable SnapStart on a Java function
OrderProcessor:
  Type: AWS::Serverless::Function
  Properties:
    Runtime: java21
    Handler: com.example.Handler::handleRequest
    MemorySize: 1024
    SnapStart:
      ApplyOn: PublishedVersions
    AutoPublishAlias: live

SnapStart eligibility and behaviour differ by runtime family, and the wrong assumption here wastes a sprint — confirm your runtime and region before designing around it:

Runtime family SnapStart support Priming hook needed? Typical cold-start win Note
Java (Corretto 11/17/21) Yes (original target) Yes — JIT/class-load priming pays off most Large (multi-second init → sub-second restore) CRaC org.crac hooks; biggest beneficiary
Python (3.12+) Yes (region-dependent) Rarely — interpreted, less JIT Moderate (import-heavy init) Use lifecycle hooks for re-seed/refresh
.NET (8+) Yes (region-dependent) Sometimes — JIT/tiered-compilation Moderate–large Confirm regional availability
Node.js No n/a Use provisioned concurrency instead No snapshot model for Node
Go / Rust (custom runtime) No n/a Fast init already; PC if needed Native binaries cold-start fast
Container image functions No (SnapStart is zip-based) n/a Trim image; PC for warmth SnapStart doesn’t apply to images

The caveats are real and you must design for them. They fall into three classes, each with a concrete failure mode:

Caveat class What goes wrong Why Where you fix it
Uniqueness Same UUID/seed/timestamp across every restored env Generated once at snapshot, then cloned afterRestore — regenerate per-env values
Stale state Dead DB connections, expired tokens/creds Captured live at snapshot, expire by restore afterRestore — re-establish/refresh
Priming First real request still slow (JIT/lazy-load) Restore is fast but JVM defers compilation beforeCheckpoint — exercise hot paths

Anything generated once during init and captured in the snapshot — a random seed, a UUID, a cached timestamp — is now identical across every restored environment. Re-seed SecureRandom and regenerate per-invocation values after restore, not at class load. The AWS Cryptography libraries handle this for you; hand-rolled randomness does not. Network connections, credentials, and ephemeral tokens captured in the snapshot may be dead or expired on restore — re-establish them in a runtime hook. And while restore is fast, the JVM may still JIT-compile and lazy-load on the first real request, so use the beforeCheckpoint hook to prime hot paths (dummy invocations of your serialization, an SDK call) so that work is captured in the snapshot.

import org.crac.Core;
import org.crac.Resource;

public class Handler implements Resource {
  public Handler() {
    Core.getGlobalContext().register(this);
  }

  @Override
  public void beforeCheckpoint(org.crac.Context<? extends Resource> c) {
    // Prime: exercise hot paths so JIT/class-load is captured in the snapshot
    warmSerializers();
    warmSdkClients();
  }

  @Override
  public void afterRestore(org.crac.Context<? extends Resource> c) {
    // Re-establish anything that must be fresh per environment
    reSeedSecureRandom();
    refreshDbCredentials();
  }
}

The CRaC lifecycle hooks, and exactly what belongs in each — putting work in the wrong hook is the most common SnapStart bug:

Hook Runs… Put here Never put here
beforeCheckpoint Once, at publish (pre-snapshot) Priming: serializers, SDK warm-up, class-load Anything that must be unique per env
afterRestore On every restore (cold start) Re-seed randomness, refresh creds, reconnect Heavy one-time work (defeats the purpose)

SnapStart vs provisioned concurrency is a real decision, not a default. SnapStart removes most of the init cold start with no idle charge but does nothing for sub-millisecond consistency and adds restore + priming complexity; provisioned concurrency gives the flattest tail latency but you pay for warm capacity continuously. Many teams run SnapStart by default and reserve PC for the few endpoints with the strictest p99. Side by side:

Dimension Provisioned Concurrency SnapStart
Cold start removed? Yes (fully, within the pool) Mostly (restore replaces init)
Idle cost Yes — pay while enabled No — pay only per invocation
Runtime support All runtimes Java, Python, .NET (region-dependent)
Tail-latency consistency Flattest (no restore variance) Restore + JIT priming variance
Code changes required None CRaC hooks (re-seed, refresh, prime)
Spillover behaviour Overflow cold-starts on-demand Each cold start restores (still fast)
Best for Strict p99, any runtime Cost-sensitive JVM/.NET/Python cold starts
Configured on Version/alias Published versions

The combined pattern many teams settle on, expressed as a decision table:

Constraint Reach for Why
Strict p99 on a Node/Go function Provisioned concurrency No SnapStart for Node/Go; PC flattens the tail
JVM cold starts dominate, cost matters SnapStart + priming Removes class-load/JIT tax for free
JVM with both cost pressure and strict p99 SnapStart for the floor + PC for the peak Free init removal + bought tail flatness
Python with heavy imports, cost-sensitive SnapStart Snapshots the import-heavy init

Connection management and reuse across invocations

The most common self-inflicted latency bug: opening a database connection, HTTP client, or secret fetch inside the handler. That work then runs on every warm invocation. Move it to module/static scope so it is created once during init and reused across invocations on the same environment.

import os
import boto3
import psycopg2

# INIT SCOPE: runs once per environment, reused by every warm invocation
_secrets = boto3.client("secretsmanager")
_conn = None

def _get_conn():
    global _conn
    if _conn is None or _conn.closed:
        _conn = psycopg2.connect(host=os.environ["DB_HOST"], connect_timeout=3)
    return _conn

def handler(event, context):
    cur = _get_conn().cursor()           # reuse the connection
    cur.execute("SELECT 1")
    return {"ok": cur.fetchone()[0]}

For Node, set AWS_NODEJS_CONNECTION_REUSE_ENABLED=1 so the SDK reuses keep-alive TCP connections (the default in SDK v3, but harmless to set explicitly). What belongs in init scope versus the handler is a clean rule you can audit code against:

Work Where it belongs Why Cost of getting it wrong
SDK / service clients Init scope Constructed once, thread-safe, reusable Per-invocation construction latency
DB connection / pool Init scope (lazy-guarded) Reuse the TCP/auth handshake Connection blowup; handshake per call
Static config / secrets Init scope (cached) Fetch once, reuse Repeated SSM/Secrets calls, throttling
HTTP keep-alive client Init scope Reuse the connection pool New TCP per call; SNAT/port pressure
Per-request state Handler Must be fresh each invocation Cross-request data bleed (a real bug)
Per-request randomness / timestamps Handler Must differ per call Duplicate IDs (worse under SnapStart)

The deeper problem at scale is connection-count blowup: 500 concurrent Lambda environments each holding a Postgres connection will exhaust max_connections on a db.r6g.large. Amazon RDS Proxy solves this by pooling and multiplexing connections on Lambda’s behalf, and it lets functions fetch DB credentials via IAM instead of embedding secrets. (The full operational treatment is in RDS Proxy in Production: Connection Pooling, Failover Acceleration, and IAM Authentication.)

aws rds create-db-proxy \
  --db-proxy-name app-proxy \
  --engine-family POSTGRESQL \
  --auth '[{"AuthScheme":"SECRETS","SecretArn":"arn:aws:secretsmanager:us-east-1:111122223333:secret:db-creds","IAMAuth":"REQUIRED"}]' \
  --role-arn arn:aws:iam::111122223333:role/rds-proxy-role \
  --vpc-subnet-ids subnet-0a1b2c subnet-0d4e5f

Point the function’s DB_HOST at the proxy endpoint, attach the function to the same VPC subnets, and let the proxy absorb the connection churn. This is non-negotiable above a few hundred concurrent executions against a relational database. The choices for taming connections, and what each buys:

Approach What it does Effort When it’s enough Limit / watch-out
Init-scope reuse (code) One connection per env, reused Code change Low/moderate concurrency Still 1 conn × concurrency at the DB
AWS_NODEJS_CONNECTION_REUSE_ENABLED=1 SDK keep-alive reuse One env var Node SDK HTTP reuse SDK v3 already does it
RDS Proxy Pools + multiplexes; IAM auth Proxy + IAM + subnets High concurrency on RDS/Aurora Small hourly cost; VPC plumbing
DynamoDB instead of RDS No connection model at all Re-architecture Key-value access patterns Different data model
Reserved concurrency cap Bounds connections from this fn One setting Protecting a fragile DB Throttles past the cap

A worked sizing example: a relational instance has a finite max_connections (a few hundred on mid-size classes). With one connection per environment and 500 concurrent environments, you need 500 connections — past the ceiling, and the database starts refusing connects, which surfaces in Lambda as dependency timeouts, not as an obvious “too many connections” on the Lambda side. RDS Proxy multiplexes those 500 environments onto a far smaller pool of actual backend connections.

Concurrency controls: reserved, throttles, and quota planning

Concurrency is the number of in-flight executions. Your account has a regional concurrency limit (1,000 by default, raisable via a Service Quotas request). Two controls shape how that pool is shared:

# Cap order-processor at 200 concurrent executions
aws lambda put-function-concurrency \
  --function-name order-processor \
  --reserved-concurrent-executions 200

The three concurrency concepts confuse everyone at first; side by side they are clear:

Concept What it is Guarantees a floor? Caps a max? Pre-warmed? Charged when idle?
Account/regional limit Total in-flight for the region No (shared) Yes (account-wide) No No
Reserved concurrency A function’s carved-out slice Yes Yes No No
Provisioned concurrency Pre-initialized subset of reserved Yes (warm) Yes Yes Yes
Unreserved pool What’s left after reservations No Implicitly No No

When a function hits its reserved limit (or the account hits the regional limit), Lambda throttles — and how the throttle surfaces depends entirely on the invocation type. This is the table to keep open during a spike incident:

Invocation type Examples On throttle Retries? What the caller sees
Synchronous API Gateway, ALB, direct invoke Rejected immediately No (caller must) 429 TooManyRequestsException
Asynchronous S3, SNS, EventBridge Lambda retries with backoff Yes (up to ~6 h, then DLQ) Delayed processing; DLQ on exhaustion
Event source mapping SQS, Kinesis, DynamoDB Streams Batch retried per source rules Yes (source-dependent) Backlog grows; iterator age climbs

There is also a burst concurrency ceiling that governs how fast you can scale from cold — you get an initial burst, then a slower per-minute ramp toward your account limit. A flash spike can outrun the ramp even when you are nowhere near the account limit. The signals that tell you which wall you hit, and the fix:

Signal Metric / where Means Fix
Throttles climbing, ConcurrentExecutions at account limit CloudWatch Lambda metrics Account/regional cap hit Service Quotas increase; reserve to partition
Throttles on one function only Per-function Throttles That function’s reserved cap hit Raise its reserved; or it’s protecting a downstream, leave it
ProvisionedConcurrencySpilloverInvocations > 0 Per-alias metric Demand exceeded the PC pool Raise PC floor / AutoScaling max
Throttles during a fast ramp, below account limit ConcurrentExecutions slope Burst ceiling outrun Pre-scale PC for the window; smooth the spike

The hard limits that shape every concurrency and performance decision — the real numbers, what they cap, and whether you can raise them:

Limit Default value What it caps Raisable? Hit it and you get…
Account concurrency (per region) 1,000 Total in-flight executions Yes (Service Quotas) Account-wide throttles (429)
Memory per function 128 MB – 10,240 MB RAM (and CPU share) No (it’s the range) Can’t exceed 10,240 MB
Function timeout 3 s default, 900 s max Max single-invocation duration No (max is 900 s) Invocation killed at the cap
API Gateway integration timeout 29 s Sync request behind API GW No 504/timeout to the client
Deployment package (zip, direct) 50 MB zipped Upload size via API Use S3 / image instead Upload rejected
Deployment package (unzipped) 250 MB Code + layers unzipped No Deploy rejected
Layers per function 5 Attached layers No Can’t add a 6th
/tmp ephemeral storage 512 MB – 10,240 MB Scratch disk Configurable in range Disk-full errors
Environment variables size 4 KB total Env var payload No Config rejected
Provisioned concurrency ≤ reserved/account Pre-warmed pool size Via the above Can’t exceed the slice
Burst concurrency Initial burst + per-min ramp Scale-from-cold rate No (managed) Throttles during sharp ramps
Invocation payload (sync) 6 MB Request/response body No RequestEntityTooLarge
Invocation payload (async) 256 KB Event body No Event rejected
Function + layer storage (account) 75 GB (default) Total code storage Yes (Service Quotas) CodeStorageExceeded on deploy
Concurrent executions per PC config = allocated PC Warm pool ceiling Via account/reserved Spillover to on-demand (cold)

Plan for it: set reserved concurrency on the function fronting your most fragile dependency, alarm on the Throttles metric, and request a regional quota increase before a launch, not during the incident.

# Terraform — reserve concurrency to partition + protect a downstream
resource "aws_lambda_function" "order_processor" {
  function_name                  = "order-processor"
  role                           = aws_iam_role.lambda.arn
  handler                        = "handler.handler"
  runtime                        = "nodejs20.x"
  memory_size                    = 1024
  reserved_concurrent_executions = 200   # cap + guarantee
}

Observability: see the cold starts you are paying for

You cannot tune what you cannot measure. Three layers, and each answers a different question. First, the layer map:

Layer What it gives you Setup Best for
CloudWatch Logs Insights Query REPORT fields at scale None (logs exist) Cold-start %, init cost, p99, peak memory
CloudWatch metrics Throttles, ConcurrentExecutions, spillover None (emitted) Alarms; spike/throttle/spillover signals
Lambda Insights Per-function CPU/mem/network/init One layer + policy Resource view without parsing logs
AWS X-Ray Per-request segment breakdown Active tracing flag “Is the latency mine or a dependency’s?”

CloudWatch Logs Insights — quantify cold-start frequency and init cost straight from the REPORT lines:

filter @type = "REPORT"
| fields @initDuration, @duration, @billedDuration, @maxMemoryUsed / 1000000 as memUsedMB
| stats count(*) as invocations,
        count(@initDuration) as coldStarts,
        avg(@initDuration) as avgInitMs,
        pct(@duration, 99) as p99DurationMs,
        max(memUsedMB) as peakMemMB

If peakMemMB sits far below your configured memory, you over-allocated; if coldStarts / invocations is high on a latency-sensitive function, that is your provisioned-concurrency / SnapStart signal. The questions you will ask in an incident, each as a one-line query target:

Question Field(s) Read it as
How often do we cold-start? count(@initDuration) / count(*) High on a sync API → buy warmth
How expensive is init? avg(@initDuration) Drives the cold-start tail
Is p99 inside SLO? pct(@duration, 99) The number the SLA is written against
Did we over-allocate memory? max(@maxMemoryUsed) vs Memory Size Far below → drop memory
Are we paying for rounding? @billedDuration vs @duration Sub-1 ms rounds up

Lambda Insights — a managed CloudWatch layer that surfaces CPU, memory, network, and init metrics per function with one config flag:

OrderProcessor:
  Type: AWS::Serverless::Function
  Properties:
    Policies:
      - CloudWatchLambdaInsightsExecutionRolePolicy
    Layers:
      - !Sub "arn:aws:lambda:${AWS::Region}:580247275435:layer:LambdaInsightsExtension:53"

AWS X-Ray — turn on active tracing to break a request into segments. The init subsegment shows cold-start cost, and downstream segments (DynamoDB, RDS, an HTTP call) reveal whether your latency is actually in your code or in a dependency you mistuned:

aws lambda update-function-configuration \
  --function-name order-processor \
  --tracing-config Mode=Active

The CloudWatch metrics worth an alarm, with starting thresholds — these are the leading indicators, not the lagging “errors spiked”:

Metric What it signals Starting threshold Why it’s leading
Throttles Concurrency limit hit > 0 sustained 5 min Requests being rejected/queued now
ProvisionedConcurrencySpilloverInvocations PC floor too low > 0 sustained Cold starts leaking past the pool
ConcurrentExecutions Approaching account/reserved cap > 80% of limit Predicts throttles before they bite
Duration p99 Warm-path latency creeping > your SLO Tail drifting toward a timeout
Errors Function failures > 1% of invocations Confirmation; pair with the cause
ProvisionedConcurrencyUtilization PC right-sized? sustained < 0.3 or > 0.9 Over- or under-provisioned warmth
IteratorAge (stream/queue sources) Event-source backlog growing > your freshness target Throttles/slow consumers fall behind
DeadLetterErrors Async failures not reaching DLQ > 0 Lost events; DLQ misconfigured
ClaimedAccountConcurrency Account headroom consumed > 80% of limit Region nearing the global cap

Cost vs latency: a decision framework

There is no universal “fastest” setting — there is the cheapest setting that meets your latency SLO. Walk it in this order; each row is symptom → first lever → then consider:

Symptom First lever Then consider
Function feels slow, no SLO pressure Power Tuning (right-size memory) Trim package / init code
High p99 on a synchronous API Power Tuning, then PC on the alias SnapStart if JVM/Python/.NET
JVM cold starts dominate, cost-sensitive SnapStart with priming hooks PC for the few strict-p99 paths
DB connection errors at scale Init-scope reuse + RDS Proxy Reserved concurrency on the DB-facing fn
Throttles under spike Request regional quota increase Reserved concurrency to protect/partition
Spillover cold starts past the PC pool Raise PC floor / AutoScaling max Pre-scale for known flash windows
Sub-millisecond tail consistency required Provisioned concurrency (SnapStart alone won’t flatten it)

The guiding principle: tune memory before you buy warmth. Right-sizing is free and often cuts both latency and cost; provisioned concurrency and SnapStart are how you buy down the cold start that remains, and they trade money or complexity for tail latency. Spend that money only on the paths whose SLO actually requires it. The full lever menu, ranked by what it costs and what it fixes:

Lever What it fixes Cost Effort Trade-off
Right-size memory Slow warm path; wasted GB-s Free (often saves) Low (Power Tuning) Must tune with real payload
Trim package / init Oversized init phase Free Medium Build discipline
Init-scope reuse Per-invocation connection cost Free Low (code move) None — pure win
SnapStart Init cold start (JVM/Py/.NET) Free (per-invoke only) Medium (CRaC hooks) Snapshot caveats
Provisioned concurrency Cold start on any runtime Idle charge while enabled Low Pay for warmth
RDS Proxy Connection blowup at scale Small hourly Medium (VPC/IAM) Plumbing
Reserved concurrency Throttle blast radius Free Low Caps the function
Quota increase Account-limit throttles Free (request) Low (lead time) Must ask ahead

Architecture at a glance

The diagram traces a single synchronous invocation across four zones, left to right, and marks the five places latency or throttles actually bite. A client calls API Gateway (the sync entry, with its 29-second ceiling), which invokes the function. The function lands in the warmth tier, where one of three things serves it: a provisioned-concurrency environment that is already initialized and ready (the flat-tail path), a SnapStart environment that restores from a snapshot instead of re-running init, or — when demand exceeds the warm pool — an on-demand environment that pays a full cold start (the spillover path, marked in red). Inside any of those, well-written code reuses init-scope connections into the RDS Proxy (which pools and multiplexes onto the database) and DynamoDB (keep-alive reuse), rather than opening a connection per call.

The fourth zone is the control and observability loop that makes the rest work. Application Auto Scaling watches provisioned-concurrency utilization (target 0.70) and raises or lowers the warm pool, closing the loop back into the warmth tier — that is the arrow from control back to the function. CloudWatch and X-Ray capture Init Duration, Throttles, spillover and per-segment latency, which is how you prove a lever worked rather than assuming it did. Follow the five numbered badges in order — spillover cold start, SnapStart uniqueness/stale state, connection blowup, PC-not-ready/on-$LATEST, and the throttle wall — and the legend narrates each as symptom, the metric that confirms it, and the fix.

AWS Lambda cold-start performance architecture: a client calls API Gateway (REST/HTTP, 29-second cap) which invokes a Lambda function served by one of three warmth paths — a provisioned-concurrency pre-init pool on an alias, a SnapStart restore-from-snapshot environment, or an on-demand environment that spillover cold-starts in red — with init-scope connection reuse into RDS Proxy (pooled, IAM auth) and DynamoDB (keep-alive), an Application Auto Scaling control loop targeting 0.70 provisioned-concurrency utilization feeding warmth back to the function, and CloudWatch plus X-Ray capturing Init Duration, Throttles and spillover; five numbered badges mark spillover cold start, SnapStart uniqueness and stale state, connection blowup at scale, provisioned concurrency not ready or on dollar-LATEST, and the throttle wall under spike

Real-world scenario

Solvent Pay, a payments platform, ran a synchronous “authorize transaction” Lambda (Java 17, Spring) behind API Gateway in ap-south-1. p50 was a healthy 40 ms, but p99 spiked to 6+ seconds whenever traffic stepped up — classic JVM cold starts as new environments spun to meet demand. The constraint was hard: a contractual p99 < 800 ms with the card network, and a finance mandate to cut Lambda spend that had ballooned after a previous engineer “fixed” an earlier latency issue by setting provisioned concurrency to a flat 300 around the clock — paying for 300 warm JVMs at 3 AM for a daytime workload. Monthly Lambda spend on that one function had crossed ₹2.4 lakh.

The team reworked it in three moves, in the right order. First, Power Tuning (run with a real authorization payload, not an empty event) showed the function was CPU-bound; moving from 1024 MB to 1769 MB cut warm duration by ~45% at roughly neutral cost — fewer GB-seconds per call offsetting the higher memory. That was the free win, taken first. Second, they enabled SnapStart with a beforeCheckpoint hook that primed the Spring context, the Jackson serializers, and the SDK clients, and an afterRestore hook that re-seeded SecureRandom and refreshed the database credentials. This removed the multi-second class-load/JIT penalty from cold starts entirely, at zero idle cost — and the afterRestore work was not optional: an early test build skipped the re-seed and two restored environments generated the same idempotency key, which their integration suite caught before it reached production.

Third, they replaced the flat 300 provisioned concurrency with Application Auto Scaling: a small floor (10) for always-on readiness, a target-tracking policy at 0.70 utilization, and a scheduled action that pre-ramped to 150 fifteen minutes before the daily 18:00 peak (because target-tracking alone reacts too slowly for a sharp diurnal step).

# The combination that hit the SLO: SnapStart for the floor, scheduled PC for the peak
AuthorizeTxn:
  Type: AWS::Serverless::Function
  Properties:
    Runtime: java17
    MemorySize: 1769
    SnapStart:
      ApplyOn: PublishedVersions
    AutoPublishAlias: live
    ProvisionedConcurrencyConfig:
      ProvisionedConcurrentExecutions: 10   # baseline; Application Auto Scaling ramps to 150 on schedule

The validation was disciplined: they ran the Logs Insights query over a window after each change and watched avgInitMs collapse toward the restore floor, p99DurationMs settle, ProvisionedConcurrencySpilloverInvocations go to zero during the ramped peak, and peakMemMB justify the 1769 MB. Result: p99 settled under 500 ms even during step-ups, and the provisioned-concurrency bill dropped roughly 70% versus the flat-300 configuration — to about ₹72,000/month. The lesson the team wrote into their runbook: “SnapStart removes the init tax for free; provisioned concurrency is for the peak tail you still cannot tolerate — and you scale it, you do not nail it to the floor. And never ship SnapStart without testing the afterRestore path, or you will clone an idempotency key into production.”

The incident-to-fix sequence, because the order of moves is the lesson:

Phase State Action Effect
Baseline p99 6+ s on step-up, ₹2.4 L/mo (flat PC 300, 1024 MB) Slow tail and overspending
Move 1 CPU-bound at 1024 MB Power Tuning → 1769 MB Warm duration −45%, ~neutral cost
Move 2 (bug) SnapStart on, no re-seed Skipped afterRestore Duplicate idempotency key in test
Move 2 (fixed) SnapStart + CRaC hooks Prime + re-seed + refresh creds Init tax gone, zero idle cost
Move 3 Flat 300 PC wasteful AutoScaling floor 10 + scheduled 150 Pay for peak only when it exists
Result p99 < 500 ms, ₹72 k/mo Validated via Logs Insights SLO met, bill −70%

Advantages and disadvantages

The buy-or-build-warmth model — right-size free, then choose PC or SnapStart — gives you precise control, but each lever has a cost shape you must weigh honestly:

Advantages (why this model helps) Disadvantages (why it bites)
Memory tuning improves latency and cost at once — the only free lever “Memory is CPU” is unintuitive; teams set 128 MB and get slow, equal-cost functions
Provisioned concurrency gives a flat, predictable tail for any runtime You pay for PC whether or not it’s invoked — flat 24x7 PC is a classic overspend
SnapStart removes the init tax with zero idle cost Snapshot caveats are subtle: cloned UUIDs and dead connections cause real, hard-to-spot bugs
Reserved concurrency partitions the account and protects fragile downstreams Set too low it throttles legitimate traffic; the cap is also a ceiling
RDS Proxy makes relational DBs survive high concurrency Adds VPC/IAM plumbing and a small hourly cost; another hop to operate
Every lever is measurable (REPORT, metrics, X-Ray) Tuning blind is easy if you skip instrumentation; the optimum is payload-specific
Application Auto Scaling makes warmth follow demand Target-tracking reacts in minutes, not seconds — flash spikes need pre-scaling

The model is right for any latency- or cost-sensitive serverless workload where you are willing to measure first and tune deliberately. It bites teams that reach for warmth before right-sizing (overspending on a problem free tuning would have solved), that enable SnapStart without testing the restore path (cloning state into production), or that nail PC to a flat number (paying for 3 AM warmth). Every disadvantage is manageable — but only if you know it exists, which is the entire point of measuring before you tune.

Hands-on lab

Right-size a function with Power Tuning, then add provisioned concurrency on an alias and prove it’s READY — all from the CLI. Free-tier-friendly (a few short invocations; we delete PC at the end so there’s no lingering idle charge). Run in CloudShell or any shell with the aws CLI configured.

Step 1 — Variables.

REGION=ap-south-1
FN=lab-cold-start-$RANDOM
ROLE_ARN=$(aws iam get-role --role-name lambda-basic-exec --query 'Role.Arn' --output text)

Step 2 — Create a tiny CPU-bound function (Node) at 128 MB to reproduce the problem.

cat > handler.js <<'EOF'
// Init scope: runs once per environment
const start = Date.now();
exports.handler = async () => {
  // A little CPU work so memory-as-CPU is visible
  let x = 0;
  for (let i = 0; i < 5_000_000; i++) x += Math.sqrt(i);
  return { ok: true, sinceInitMs: Date.now() - start, x };
};
EOF
zip function.zip handler.js

aws lambda create-function --function-name $FN --runtime nodejs20.x \
  --handler handler.handler --zip-file fileb://function.zip \
  --role "$ROLE_ARN" --memory-size 128 --timeout 10 --region $REGION

Step 3 — Invoke a few times and read the REPORT line. The first call is cold (Init Duration present); note Duration at 128 MB.

for i in 1 2 3; do
  aws lambda invoke --function-name $FN --region $REGION /dev/null \
    --log-type Tail --query 'LogResult' --output text | base64 -d | grep REPORT
done

Expected: a REPORT line on the first call with Init Duration: ...; subsequent calls warm. The CPU loop’s Duration will be high at 128 MB.

Step 4 — Raise memory to 1024 MB and re-measure. Same code, more CPU.

aws lambda update-function-configuration --function-name $FN \
  --memory-size 1024 --region $REGION
aws lambda invoke --function-name $FN --region $REGION /dev/null \
  --log-type Tail --query 'LogResult' --output text | base64 -d | grep REPORT

Expected: Duration drops sharply versus 128 MB — that’s CPU scaling with memory. Note Max Memory Used is far below 1024, so for a non-CPU-bound function you’d drop back down; here the CPU work justifies the memory.

Step 5 — Publish a version, point an alias at it, and add provisioned concurrency.

VER=$(aws lambda publish-version --function-name $FN --region $REGION --query Version --output text)
aws lambda create-alias --function-name $FN --name live \
  --function-version "$VER" --region $REGION
aws lambda put-provisioned-concurrency-config --function-name $FN \
  --qualifier live --provisioned-concurrent-executions 2 --region $REGION

Step 6 — Wait for PC to be READY, then confirm no cold start on the alias.

# Poll until READY (don't serve traffic before this)
aws lambda get-provisioned-concurrency-config --function-name $FN \
  --qualifier live --region $REGION \
  --query '{status:Status, allocated:AllocatedProvisionedConcurrentExecutions}'

# Invoke the ALIAS — REPORT should have NO "Init Duration"
aws lambda invoke --function-name $FN:live --region $REGION /dev/null \
  --log-type Tail --query 'LogResult' --output text | base64 -d | grep REPORT

Expected: Status: READY, allocated: 2; the alias invocation’s REPORT has no Init Duration — the cold start is gone because the environment was pre-warmed.

The lab steps mapped to what each proves:

Step What you did What it proves Real-world analogue
3 Invoke at 128 MB, read REPORT Init Duration only on cold starts; CPU is throttled The “Lambda is slow” complaint
4 Raise to 1024 MB, re-measure Memory is CPU — duration drops The free right-sizing win
5 Version + alias + PC PC must target a version/alias, never $LATEST The deploy-then-shift model
6 Wait for READY, invoke alias A warm pool removes the cold start Buying down the tail on a sync API

Cleanup (remove the idle PC charge first, then the function).

aws lambda delete-provisioned-concurrency-config --function-name $FN \
  --qualifier live --region $REGION
aws lambda delete-function --function-name $FN --region $REGION

Cost note. A handful of sub-second invocations is effectively free under the Lambda free tier; the only chargeable item is provisioned concurrency, which bills for the time it’s enabled — deleting it in cleanup (before the function) stops that immediately. The whole lab runs to a few rupees at most.

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First, the error/exception reference: the specific strings and codes you’ll see, what they mean on Lambda, and the fix — scan this before the symptom table.

Error / code Where it surfaces What it means Likely cause Fix
429 TooManyRequestsException Sync caller (API GW, invoke) Concurrency limit hit Reserved/account cap or burst ceiling Raise reserved; quota increase; pre-scale PC
Rate Exceeded / ThrottlingException Control-plane calls API throttled Too many config/list calls Back off; batch; cache
ProvisionedConcurrencyConfigNotFoundException get-provisioned-concurrency-config No PC on that qualifier PC not set, or wrong alias/version Attach PC to the alias traffic uses
InvalidParameterValueException ($LATEST) put-provisioned-concurrency-config PC rejected on $LATEST Targeted $LATEST Publish a version; target an alias
Status: FAILED (PC) PC config status Allocation failed Not enough account concurrency Raise account limit; lower other PC
Task timed out after N seconds Function logs Hit the configured timeout Slow code/dependency; timeout too low Speed up; raise timeout (≤900 s)
Endpoint request timed out (504) API Gateway Function exceeded 29 s Long sync work behind API GW Make async; shorten; Step Functions
Runtime exited ... signal: killed Function logs OOM — exceeded memory Memory too low for the workload Raise memory; fix leak
connection ... too many connections DB / dependency logs DB max_connections hit Connection blowup at scale Init-scope reuse; RDS Proxy
ResourceConflictException Deploy / config update Concurrent update in progress Overlapping deploys/config calls Serialize; retry after settle
ENILimitReached / slow VPC cold start VPC-attached function Hyperplane ENI pressure Many subnets/functions, small subnet Right-size subnet CIDR; consolidate
Unable to import module 'handler' Init phase Init crashed before handler Missing dep, bad layer, import error Fix dependency/layer; check unzip -l

Now the symptom → cause → confirm → fix table you read mid-incident, then the entries that bite hardest expanded with the exact commands.

# Symptom Root cause Confirm (exact cmd / metric) Fix
1 High p99 on a sync API; p50 fine Cold starts on the critical path Logs Insights: high coldStarts/invocations; X-Ray init subsegment PC on the alias; SnapStart if eligible
2 Function “slow”, cost unchanged at 128 MB CPU-starved at low memory Power Tuning shows duration drops with memory Raise to the Power-Tuning winner
3 Provisioned concurrency “not working” PC on $LATEST, or not READY, or wrong qualifier get-provisioned-concurrency-config Status ≠ READY Target a version/alias; wait for READY
4 p99 still spikes despite PC Demand exceeds the PC pool (spillover) ProvisionedConcurrencySpilloverInvocations > 0 Raise PC floor / AutoScaling max; pre-scale
5 Duplicate IDs / “same value” across requests SnapStart cloned a UUID/seed IDs identical post-restore; only on SnapStart fns Regenerate in afterRestore, not at init
6 Auth/connection failures right after deploy SnapStart restored stale creds/connections Failures cluster on cold (restored) envs Refresh creds/reconnect in afterRestore
7 First request slow even with SnapStart JVM JIT/lazy-load not primed First post-restore call slow, then fast Prime hot paths in beforeCheckpoint
8 DB dependency timeouts at scale, fine at rest Connection blowup (per-invocation connects) RDS connection count near max_connections Init-scope reuse; front with RDS Proxy
9 429 TooManyRequestsException under load Reserved/account concurrency limit hit Throttles climbing; ConcurrentExecutions at cap Raise reserved; Service Quotas increase
10 Throttles during a fast ramp, below account limit Burst-concurrency ceiling outrun ConcurrentExecutions slope caps then throttles Pre-scale PC for the window; smooth spike
11 Async events delayed / landing in DLQ Throttled async invocations retrying Throttles on an async-triggered fn; DLQ depth Raise concurrency; check the DLQ cause
12 Init phase takes seconds Oversized package / eager init work High Init Duration; large unzip -l Trim deps, mark SDK external, lazy-init
13 Memory bill high, function fast Over-allocated memory Max Memory UsedMemory Size Drop memory toward peak + headroom
14 Costs jumped after a “latency fix” Flat 24x7 provisioned concurrency PC config shows a high static floor AutoScaling: low floor + scheduled/target ramp

The expanded form for the entries that cause the most wasted hours:

1. High p99 on a synchronous API, p50 healthy. Root cause: cold starts on the critical path — new environments paying init as traffic steps up. Confirm: Logs Insights shows a high count(@initDuration)/count(*) ratio on that function; X-Ray’s init subsegment shows the time landing in init, not your handler. Fix: right-size memory first (free), then provisioned concurrency on the alias for a flat tail; SnapStart if the runtime is Java/Python/.NET and cost matters more than absolute consistency.

3. Provisioned concurrency “isn’t doing anything.” Root cause: PC was put on $LATEST (rejected) or on a different qualifier than the one traffic hits, or it isn’t READY yet and traffic was shifted early. Confirm: aws lambda get-provisioned-concurrency-config --qualifier <alias> returns Status: IN_PROGRESS/FAILED, or the alias your API points to has no PC config. Fix: attach PC to the version/alias your traffic actually uses; wait for READY before promoting; ensure the API stage points at the PC-backed alias.

5 & 6. SnapStart cloned state / restored stale connections. Root cause: values generated once at snapshot (UUIDs, SecureRandom seed, timestamps) are identical across every restore; connections/creds captured live are dead/expired on restore. Confirm: duplicate IDs appear only on SnapStart-enabled functions and cluster on cold (restored) environments; auth/connection failures cluster immediately post-deploy. Fix: regenerate per-environment values and re-seed SecureRandom in afterRestore; refresh credentials and re-establish connections there too. Never rely on init-time randomness under SnapStart.

8. Relational dependency times out under load, fine at rest. Root cause: the handler opens a connection per invocation, so at N concurrent environments you need N connections; past max_connections the database refuses connects, surfacing as Lambda-side dependency timeouts. Confirm: RDS/Aurora connection count climbs toward max_connections exactly as Lambda concurrency rises; the failures are connects, not query errors. Fix: move the connection to init scope and reuse it; for high concurrency front the database with RDS Proxy so it multiplexes many environments onto a small backend pool.

9 & 10. Throttles — but which wall? Root cause: either a reserved-concurrency cap on the function, the account/regional limit, or the burst ceiling during a sharp ramp. Confirm: Throttles rising with ConcurrentExecutions pinned at the account limit → account cap; Throttles on one function with others fine → its reserved cap; throttles while ConcurrentExecutions is still climbing and below the limit → burst ceiling. Fix: Service Quotas increase ahead of a launch for the account cap; raise (or accept, if it’s protecting a downstream) the reserved cap; pre-scale provisioned concurrency for a known flash window the burst ramp can’t keep up with.

14. The bill jumped after someone “fixed” latency. Root cause: provisioned concurrency nailed to a flat 24x7 number, paying for warmth at 3 AM for a daytime workload. Confirm: the PC config shows a high static provisioned-concurrent-executions with no Application Auto Scaling target/schedule attached. Fix: set a small always-on floor and let Application Auto Scaling raise it on a schedule (diurnal peaks) or a utilization target; reserve a higher static floor only for genuine flash windows.

Best practices

Security notes

The security controls that also improve resilience — they pull the same direction here:

Control Mechanism Secures against Also prevents
IAM DB auth via RDS Proxy Proxy IAMAuth: REQUIRED Embedded DB passwords Credential staleness under SnapStart
Least-privilege exec role Scoped IAM policy Over-broad blast radius Accidental calls to wrong resources
Re-seed SecureRandom (afterRestore) CRaC hook Predictable/cloned randomness Duplicate idempotency keys
Reserved concurrency cap reserved-concurrent-executions Runaway downstream abuse DB connection exhaustion
KMS-encrypted env vars Customer-managed key Plaintext secret exposure (audit/rotation hygiene)
Secrets Manager rotation Automatic rotation Long-lived static creds Credential drift breaking the app

Cost & sizing

The bill drivers, how they interact with the fixes, and what to watch:

A rough monthly picture for a single busy synchronous function, before vs after deliberate tuning:

Cost driver What you pay for Rough INR / month What it fixes Watch-out
GB-seconds (right-sized) Memory × duration, tuned varies with traffic Slow warm path Bills more if over-allocated
Provisioned concurrency (flat 300) 24x7 warm pool ~₹2.4 L (the anti-pattern) Cold starts Pays for 3 AM warmth
Provisioned concurrency (floor 10 + ramp) Warm only at peak ~₹70–80 k Cold starts, cost-aware Pre-scale for flash spikes
SnapStart Per-invoke restore only ~₹0 idle JVM init tax Snapshot caveats to code around
RDS Proxy Hourly per-vCPU ~₹1.5–3 k Connection blowup Needs VPC/IAM
Requests Per invocation scales with traffic (inherent) High-throughput accumulates
Observability Logs/metrics/traces per GB ~₹1–3 k Tuning blind Sample + set retention

The sizing rule in one line: find the cheapest memory that meets the warm-path latency, add only the warmth (PC or SnapStart) the cold-start SLO needs, and scale that warmth to demand. Solvent Pay landed at ~₹72,000/month after doing exactly this — down 70% from the flat-300 anti-pattern — proof that the fix is usually deliberate tuning, not a bigger anything.

Interview & exam questions

1. Why is a Lambda function at 128 MB not necessarily “cheap”? Lambda allocates CPU proportionally to memory, so a CPU-bound function at 128 MB runs on a sliver of a vCPU and takes far longer — and since billing is GB-seconds, the slower run can cost the same or more than a higher-memory run that finishes quickly, while delivering worse latency. Right-sizing with Power Tuning often lowers both latency and cost.

2. What are the three parts of a cold start, and which do you control most? Environment download/init (microVM provision, package/image pull, runtime start — mostly AWS), your init phase (imports, SDK clients, connections — the part you control and that’s billed), and the warm invoke (your handler). You have the most leverage over the init phase via package trimming and lazy initialization.

3. Provisioned concurrency vs SnapStart — when do you pick each? Provisioned concurrency pre-initializes a pool for the flattest tail on any runtime, but you pay for it whenever it’s enabled. SnapStart restores from a snapshot of init (Java/Python/.NET) with no idle cost but adds snapshot caveats and restore/priming variance. Pick PC for strict p99 or unsupported runtimes; pick SnapStart for cost-sensitive JVM/.NET/Python cold starts; combine them for JVM workloads with both pressures.

4. Why must provisioned concurrency target a version or alias, never $LATEST? $LATEST is mutable, so PC can’t guarantee a stable, pre-initialized snapshot of code/config against it — AWS rejects it. Targeting an immutable version (usually via an alias) enforces a deploy-then-shift model where you publish, warm, confirm READY, then move traffic.

5. What breaks if you don’t implement afterRestore under SnapStart? Anything generated once at snapshot — SecureRandom seeds, UUIDs, timestamps — is cloned identically across every restored environment, and captured connections/credentials may be dead or expired. You get duplicate IDs (e.g. idempotency keys), predictable randomness (a security flaw), and auth/connection failures on cold starts. afterRestore re-seeds and refreshes these per environment.

6. How does SnapStart’s beforeCheckpoint hook help latency? Restore is fast, but the JVM still JIT-compiles and lazy-loads on the first real request, so the first post-restore call can be slow. beforeCheckpoint runs at publish time and lets you prime hot paths (exercise serializers, make a dummy SDK call) so that compiled/loaded state is captured in the snapshot and the first real request is already fast.

7. What is SNAT-style connection blowup on Lambda, and how do you fix it? Each concurrent environment that opens its own database connection multiplies connections by concurrency; at a few hundred concurrent executions you exhaust the database’s max_connections and it refuses connects, surfacing as dependency timeouts. Fix by opening connections in init scope and reusing them, and by fronting the database with RDS Proxy, which multiplexes many environments onto a small backend pool.

8. Reserved vs provisioned concurrency? Reserved concurrency carves a guaranteed-and-capped slice of the account limit for a function (protecting downstreams and partitioning the account) but does not pre-warm anything. Provisioned concurrency is a subset of reserved that is also kept initialized and ready. Reserved bounds; provisioned bounds and warms.

9. How does a throttle surface differently for synchronous, asynchronous, and event-source invocations? Synchronous callers (API Gateway, direct invoke) get an immediate 429 TooManyRequestsException and must retry themselves. Asynchronous invocations (S3, SNS, EventBridge) are retried by Lambda with backoff and eventually go to a DLQ. Event source mappings (SQS, Kinesis, DynamoDB Streams) retry per the source’s rules, so the backlog and iterator age grow.

10. You enabled provisioned concurrency but p99 still spikes under load. Why and what do you check? Demand is exceeding the provisioned pool, so the overflow runs on-demand and cold-starts. Check ProvisionedConcurrencySpilloverInvocations — any sustained non-zero value means raise the PC floor or the Application Auto Scaling max, and pre-scale for known flash windows since target-tracking reacts too slowly for sharp spikes.

11. What’s the single fastest way to tell a cold start from a warm one in the logs? The REPORT line includes Init Duration only on cold starts. Filter on its presence (e.g. count(@initDuration) in Logs Insights) to measure cold-start frequency and cost without any extra instrumentation.

12. Why pre-scale provisioned concurrency for a flash sale instead of relying on target-tracking? Target-tracking Application Auto Scaling reacts over minutes, and even on-demand scale-out is bounded by the burst-concurrency ceiling, so a spike that arrives in seconds outruns both and cold-starts (or throttles). A scheduled action that ramps the PC floor before the known window keeps the pool warm ahead of the traffic.

These map to AWS Certified Developer – Associate (DVA-C02)develop, deploy and troubleshoot serverless applications, Lambda configuration, concurrency, and observability — and AWS Certified Solutions Architect – Associate (SAA-C03) for the architecture trade-offs (PC vs SnapStart, RDS Proxy, API Gateway fronting). The performance-and-cost optimization angle touches AWS Certified DevOps Engineer – Professional (DOP-C02). A compact cert mapping:

Question theme Primary cert Objective area
Memory-as-CPU, right-sizing, GB-seconds DVA-C02 Optimize serverless cost/performance
PC vs SnapStart, aliases/versions DVA-C02 Deploy & configure Lambda
Concurrency model, throttles, quotas DVA-C02 / SAA-C03 Resilient serverless design
RDS Proxy, connection reuse SAA-C03 Design scalable data tiers
CloudWatch/X-Ray, Logs Insights DVA-C02 / DOP-C02 Instrument & troubleshoot
AutoScaling PC, pre-scaling for spikes DOP-C02 Automation & scaling

Quick check

  1. A synchronous API’s p50 is 40 ms but p99 is 6 seconds under load. What’s the most likely cause, and the first (free) lever before you spend money?
  2. True or false: scaling a function to more memory always costs more.
  3. You put provisioned concurrency on a function but it still cold-starts. Name two things to check.
  4. Under SnapStart, two environments returned the same idempotency key. What went wrong and where do you fix it?
  5. Your relational database starts refusing connections exactly as Lambda concurrency climbs past a few hundred. What’s happening and what’s the fix?

Answers

  1. Cold starts on the critical path as new environments spin up under load (p50 is the warm path, p99 the cold tail). The first lever is right-sizing memory with Power Tuning — it’s free and often cuts the warm duration too; only then do you buy warmth (provisioned concurrency on the alias, or SnapStart if the runtime supports it).
  2. False. Memory is CPU, so more memory can make a CPU-bound function finish in far fewer GB-seconds — lowering the bill and the latency. You only overspend if you allocate memory the function doesn’t use (check Max Memory Used).
  3. Check (a) that PC targets a version or alias your traffic actually hits, never $LATEST, and (b) that its Status is READY (not IN_PROGRESS/FAILED) before traffic was shifted. Also watch ProvisionedConcurrencySpilloverInvocations — non-zero means demand exceeds the pool.
  4. SnapStart captured a value generated once at snapshot (a UUID/SecureRandom seed) and cloned it across every restored environment. Regenerate per-environment values and re-seed randomness in the afterRestore hook, not at init/class-load.
  5. Connection blowup — each concurrent environment opened its own connection, exhausting the database’s max_connections. Fix by reusing the connection in init scope and fronting the database with RDS Proxy, which multiplexes many environments onto a small backend pool.

Glossary

Next steps

You can now decompose Lambda latency, right-size for free, buy down the cold start that remains, and prove every change. Build outward:

awslambdaserverlessperformancecold-startprovisioned-concurrencysnapstartobservability
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments