Optimizing AWS Lambda Performance: Cold Starts, Provisioned Concurrency, SnapStart, and Memory Tuning

“Lambda is slow” is almost never true. What is true is that an under-tuned function pays for a cold start it could have priced away, runs on a fraction of a vCPU because someone set 128 MB and forgot, and opens a fresh database connection on every invocation because the handler does its work in the wrong scope. Latency on AWS Lambda — the event-driven compute service that runs your code in ephemeral, auto-scaled execution environments — is a tuning problem, not a platform limit. The platform gives you a precise set of levers; the skill is reaching for them in the right order and proving each one moved the number you care about.

This guide walks those levers in order of leverage: understand the cold start, tune memory (which is also CPU), then decide whether provisioned concurrency or SnapStart is justified, fix connection reuse, and plan concurrency so a load spike does not turn into a wall of throttles. Because this is a reference you will return to mid-incident — when p99 has blown past your SLA at 18:03 on a flash-sale Friday — the playbook itself, the metrics, the runtime support matrix, the limits and the cost drivers are all laid out as scannable tables. Read the prose once to build the mental model, then keep the tables open when you are actually tuning.

By the end you will stop guessing. You will know whether a slow request is an oversized init phase, a CPU-starved 128 MB function, a spillover cold start past your provisioned pool, a SnapStart restore that cloned a UUID across every environment, a connection blowup against max_connections, or a throttle wall you could have seen coming in the Throttles metric. Knowing which — from the REPORT line and three CloudWatch metrics — is what separates a five-minute tune from a two-hour stare at the wrong dashboard.

What problem this solves

Serverless promises you ship a handler and forget the fleet. That abstraction is a gift until latency matters, and then the very thing that makes Lambda elastic — spinning a fresh execution environment on demand — becomes the thing your p99 trips over. A synchronous API behind API Gateway has a hard ceiling (29 seconds at the gateway), a card network might hold you to a contractual p99 under 800 ms, and a flash sale can ask for 10x your steady-state concurrency in ninety seconds. Under that pressure the defaults betray you: 128 MB of memory means a sliver of a vCPU, no provisioned warmth means every scale-out instance cold-starts, and a handler that opens a connection per call melts your database long before Lambda itself complains.

What breaks without this knowledge is predictable and expensive. An engineer “fixes” a latency spike by nailing provisioned concurrency to a flat 300 around the clock — paying for 300 warm JVMs at 3 AM for a daytime workload — and the finance team notices. Another sets 128 MB to “save money” and the CPU-bound function runs 14x slower at the same GB-second cost, so the bill is identical and the latency is terrible. A third moves heavy init into the handler, so every warm invocation re-opens a connection, and at 500 concurrent environments the relational database hits max_connections and starts refusing everyone. None of these are platform limits. Every one is a tuning decision made blind.

Who hits this: anyone running Lambda on a latency-sensitive or high-concurrency path. It bites hardest on JVM and .NET functions (heavy init, JIT/class-load tax on cold start), synchronous APIs with a strict tail-latency SLO, relational-database-backed functions at scale (connection exhaustion), and spiky workloads (throttles and burst-limit walls). The fix is almost never “use more memory blindly” or “buy warmth everywhere” — it is measure the init, right-size first because it is free, then buy down only the cold start that remains on the paths whose SLO actually requires it.

To frame the whole field before the deep dive, here is every latency symptom this guide addresses, the question it forces, and the first lever to reach for:

Symptom	What is actually happening	First question to ask	First place to look	First lever
Slow first request after idle	Fresh environment paying init	Is this cold (has `Init Duration`) or warm?	`REPORT` line in CloudWatch Logs	Trim init; consider PC/SnapStart
Function feels slow, no SLO pressure	Under-provisioned CPU at low memory	Is it CPU-bound at the current memory?	Lambda Power Tuning sweep	Right-size memory (free)
High p99 on a synchronous API	Cold starts on the critical path	Is p99 driven by `Init Duration`?	X-Ray init subsegment; Logs Insights	Provisioned concurrency on the alias
JVM/.NET cold starts dominate, cost-sensitive	Class-load + JIT tax per cold env	Is the runtime SnapStart-eligible here?	Runtime + region support matrix	SnapStart with priming hooks
DB connection errors at scale	New connection per invocation	Are connections opened in the handler?	RDS connection count vs cap	Init-scope reuse + RDS Proxy
429s / requests dropped under spike	Concurrency limit / burst ceiling hit	Reserved cap, account cap, or burst?	`Throttles` + `ConcurrentExecutions`	Reserve to partition; raise quota

Learning objectives

By the end of this article you can:

Decompose a cold start into its three measurable parts (environment download/init, your init phase, warm invoke) and read each from the CloudWatch REPORT line.
Right-size memory with AWS Lambda Power Tuning, understanding that Lambda allocates CPU proportionally to memory, and prove the change cut both latency and cost with a representative payload.
Decide between provisioned concurrency and SnapStart for a given function — and configure both correctly (alias/version, never $LATEST; CRaC hooks for SnapStart) — driving PC with Application Auto Scaling rather than a flat 24x7 number.
Move connection and client setup into init scope for reuse, and front a relational database with Amazon RDS Proxy to survive high concurrency without exhausting max_connections.
Plan concurrency: reserved vs provisioned vs the regional limit, the burst ceiling, and how throttles surface differently for synchronous vs asynchronous vs event-source invocations.
Instrument with CloudWatch Logs Insights, Lambda Insights, and AWS X-Ray to quantify cold-start frequency, init cost, p99, peak memory, spillover and throttles.
Read the limits, the runtime support matrix, and the cost drivers as reference tables, and pick the cheapest configuration that meets a stated latency SLO.

Prerequisites & where this fits

You should already be comfortable authoring and deploying a Lambda function (a handler, a deployment package or container image, an execution role), reading JSON, and running the aws CLI. You should understand what an alias and a version are, that API Gateway can front a function synchronously, and the basics of how Lambda scales (one environment serves one request at a time; concurrency is the count of in-flight executions). Familiarity with a VPC, security groups, and a relational database connection pool helps for the RDS Proxy material.

This sits in the performance and cost-optimization layer of the serverless track. The mechanics underneath it — runtimes, triggers, layers, the full concurrency model — are covered in AWS Lambda, In Depth: Runtimes, Triggers, Layers, Concurrency & Every Setting, which is upstream of this article. The synchronous entry path and its 29-second ceiling come from Amazon API Gateway, In Depth: REST vs HTTP vs WebSocket APIs, Integrations & Authorizers. The connection-pooling fix is a deep topic in its own right — see RDS Proxy in Production: Connection Pooling, Failover Acceleration, and IAM Authentication. And the measurement layer that proves every lever worked lives in AWS Observability, In Depth: CloudWatch, CloudTrail, Config & EventBridge and Distributed Tracing on AWS with X-Ray: Service Maps, Segments, and ADOT on EKS.

A quick map of who owns which lever, so you pull the right one and call the right person:

Layer	What lives here	Who usually owns it	Latency failure it can cause
Client / API Gateway	TLS, request routing, 29 s timeout	Frontend / API team	Timeout if function exceeds 29 s; retries amplify load
Function config	Memory, timeout, alias, PC, SnapStart	App / platform team	Slow CPU at low memory; cold starts; spillover
Function code (init scope)	Imports, SDK clients, connections	App / dev team	Oversized init; per-invocation connection blowup
Concurrency controls	Reserved, provisioned, account quota	Platform / SRE	Throttles (429); burst-ceiling wall
Downstream (RDS / DynamoDB)	Connection pool, capacity	Data team	`max_connections` exhaustion; dependency latency
Observability	Logs, metrics, traces	SRE / platform	Tuning blind; can’t prove a change worked

Core concepts

Five mental models make every later decision obvious.

A cold start is the work before your handler runs on a fresh environment. When Lambda needs a new execution environment it does three things: provisions the microVM and pulls your package or image, runs your init phase (everything outside the handler — imports, SDK clients, static config, connection setup), then runs your handler. The first two parts happen once per environment and are billed; after the first invocation the environment is reused (a warm invocation) until it is recycled. “Cold start” is those first two parts; everything you do to fight latency is either making them cheaper, making them happen ahead of traffic, or avoiding them entirely.

Memory is CPU. Lambda allocates CPU proportionally to the memory you configure. At 1,769 MB a function gets the equivalent of one full vCPU; below that you get a fraction, above it more than one (up to ~6 vCPUs at 10,240 MB). A CPU-bound function at 128 MB is not “cheap” — it runs roughly 14x slower than at 1,769 MB, and because Lambda bills GB-seconds (memory × duration), the slower run can cost the same or more while delivering far worse latency. This is the single highest-leverage knob and the most misunderstood.

Warmth is something you buy, two ways. Provisioned concurrency keeps a pool of environments fully initialized and ready, so the init phase has already happened before traffic arrives — you pay for that warmth continuously, whether or not it is invoked. SnapStart instead runs init once at publish time, snapshots the initialized microVM, and restores from the snapshot on cold start — no idle charge, but you inherit snapshot caveats (cloned uniqueness, stale connections, JIT priming). They solve the same problem with opposite cost shapes.

Concurrency is finite and shared. Concurrency is the number of in-flight executions. Your account has a regional concurrency limit (1,000 by default, raisable). Reserved concurrency carves a guaranteed-and-capped slice out of that pool for one function; provisioned concurrency is a pre-warmed subset of reserved. There is also a burst ceiling governing how fast you can scale from cold. Exceed any of these and Lambda throttles — and how that throttle surfaces depends on how the function was invoked.

Connections must live in init scope. Anything expensive to create — a database connection, an HTTP client, a secret fetch — belongs outside the handler, in init scope, so it is created once per environment and reused by every warm invocation. Put it inside the handler and it runs on every call, adding latency and, at scale, exhausting downstream connection limits. This single discipline prevents most self-inflicted Lambda latency.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary repeats these for lookup; this is the model side by side:

Concept	One-line definition	Where it lives	Why it matters to latency
Execution environment	The microVM that runs one invocation at a time	Lambda-managed	Cold when new; warm when reused
Init phase	Code outside the handler, run once per env	Your code (module scope)	Dominates cold-start cost you control
Cold start	Env provision + init before first invoke	Lifecycle	The latency you are fighting
Warm invoke	Handler on a reused environment	Lifecycle	The fast path; skips init
Memory (MB)	Configured RAM; also sets CPU share	Function config (128–10240)	More memory = more CPU = faster
Provisioned concurrency	Pre-initialized, always-ready pool	On a version/alias	Removes cold start; idle cost
SnapStart	Snapshot-restore instead of re-init	On published versions	Removes most init; no idle cost
Reserved concurrency	Guaranteed + capped slice of the account	Per function	Partitions; protects downstreams
Throttle	Invocation rejected at a concurrency limit	Runtime behaviour	429 sync; retries async
RDS Proxy	Pools/multiplexes DB connections	In front of RDS/Aurora	Stops connection blowup at scale
GB-second	Billing unit: memory × duration	Billing	Why slow-and-small ≠ cheap

Anatomy of a cold start

A cold start has three measurable parts, and only one of them is fully yours to control. Knowing which part dominates tells you which lever to pull.

Part	What happens	Billed?	Who controls it	How to reduce
Download / env init	Provision microVM, pull package/image, start runtime	No (platform)	Mostly AWS; you affect package/image size	Smaller artifact; zip over large container; fewer layers
Init phase (your code)	Imports, SDK clients, static config, connections	Yes	You (module scope)	Trim deps, lazy-init, mark SDK external
Invoke (warm path)	Your handler body	Yes	You	Right-size memory; efficient code

The init phase is where you have the most leverage, and two things dominate it: package size and what your code does at import time. A 250 MB unzipped bundle that eagerly constructs a dozen SDK clients and reads SSM parameters synchronously will have an init phase measured in seconds. Trim both.

# What is actually in the bundle? Init time tracks closely with this.
unzip -l function.zip | tail -1

# For Node, prune dev deps and bundle/tree-shake so only used code ships
npm prune --omit=dev
npx esbuild src/handler.js --bundle --minify --platform=node \
  --target=node20 --external:@aws-sdk/* --outfile=dist/handler.js

The AWS SDK v3 (@aws-sdk/*) and boto3 are already present in the managed runtimes. Marking the SDK --external and not bundling it keeps your artifact small. Pin to a layer only if you need behaviour the runtime’s bundled SDK lacks.

You read the init duration directly from the REPORT line in CloudWatch Logs — Init Duration appears only on cold-start invocations, which makes it a clean signal to filter on. Here is exactly what each REPORT field tells you and how to act on it:

`REPORT` field	What it measures	Read it as	Action if it’s high
`Init Duration`	Time in your init phase (cold only)	Cold-start cost you own	Trim package/init; PC or SnapStart
`Duration`	Handler execution time	Warm-path latency	Right-size memory; profile code
`Billed Duration`	What you pay for (rounded up to 1 ms)	The bill driver	Lower memory only if not CPU-bound
`Max Memory Used`	Peak memory of the invocation	Headroom vs configured	Drop memory if far below; raise if near
`Memory Size`	Configured memory	Your setting	The knob you tune
`XRAY TraceId` (if active)	Trace correlation	Where to drill in X-Ray	Open the trace for segment breakdown

Two cold-start facts worth internalizing, because they shape everything downstream. First, the init phase runs with a brief CPU boost in unprovisioned environments — AWS gives init extra CPU regardless of your memory setting — which is why a heavy init isn’t quite as slow as the same work mid-handler, but it is still billed and still on the critical path. Second, init has a 10-second soft budget before the platform may retry the initialization; an init that legitimately needs longer is a design smell. The init-cost contributors, ranked:

Init cost	Typical magnitude	Reduce it by	Trade-off
Package / image pull	100 ms – several s (size-dependent)	Smaller artifact; zip vs big container; layer hygiene	Build discipline
Runtime boot	50 ms – 1 s	Lighter runtime; avoid heavy frameworks	Framework features lost
SDK client construction	50–500 ms each	Construct in init scope once; only what you use	Slightly more module code
Synchronous config fetch (SSM/Secrets)	50 ms – seconds	Cache; fetch fewer params; batch	Less granular config refresh
Framework / DI graph (Spring etc.)	1–10+ s (JVM/.NET)	SnapStart + priming; lighter framework	Complexity; framework lock-in
First DB connect / pool prime	50 ms – seconds	Init scope; pooled driver; RDS Proxy	First real request still primes

Memory is CPU: right-size with Lambda Power Tuning

This is the highest-leverage knob and the one most teams get wrong by guessing. Because CPU scales with memory, a CPU-bound function at 128 MB is slow and not actually cheaper — it just runs longer at fewer GB per second. Do not guess. Run AWS Lambda Power Tuning, an open-source Step Functions state machine that invokes your function across a memory sweep and plots cost against speed. (Step Functions itself is covered in AWS Step Functions in Production: Express vs Standard, Distributed Map, and Resilient Error Handling.)

# Deploy the tuner from the Serverless Application Repository
sam deploy \
  --template-file template.yaml \
  --stack-name lambda-power-tuning \
  --capabilities CAPABILITY_IAM \
  --parameter-overrides "PowerValues=128,256,512,1024,1536,1769,3008"

{
  "lambdaARN": "arn:aws:lambda:us-east-1:111122223333:function:order-processor",
  "powerValues": [128, 256, 512, 1024, 1536, 1769, 3008],
  "num": 50,
  "payload": { "orderId": "test-123" },
  "strategy": "balanced"
}

The strategy you pick changes what the tuner optimizes for — choose it for the path’s actual goal, not by reflex:

Strategy	Optimizes for	Use it when	Risk if misused
`cost`	Cheapest acceptable config	Batch / async, no latency SLO	Picks low memory → slow for users
`speed`	Fastest config	Latency-critical synchronous path	Overspends on memory you don’t need
`balanced`	Best cost-vs-speed tradeoff	Default; most functions	May miss a strict p99 target

The single most important rule: tune with a representative payload. A synthetic empty event under-exercises the function and lies about the optimum; a real-shaped payload reveals the true CPU profile. I have repeatedly found that moving a JSON-crunching function from 512 MB to 1024 MB halves duration and lowers cost because the work finishes in less than half the GB-seconds. Memory is also the only knob that changes both axes at once — most levers trade cost for latency, this one can improve both:

Memory	Approx vCPU share	Best for	Cost note
128 MB	~0.07 vCPU (a sliver)	Trivial glue, no CPU work	“Cheap” only if truly I/O-bound and idle-fast
512 MB	~0.28 vCPU	Light transforms	Often slower and not cheaper than 1024 for CPU work
1024 MB	~0.58 vCPU	Common sweet spot for APIs	Frequently the balanced-strategy winner
1769 MB	~1.00 vCPU (full)	CPU-bound work; JVM	Below this, single-threaded code can’t use a full core
3008 MB	~1.79 vCPU	Parallel / heavy compute	More cores; watch GB-second cost
10240 MB	~6 vCPU (max)	Multi-threaded, compute-heavy	Max CPU; only if the code parallelizes

Apply the winner explicitly, and verify it took:

aws lambda update-function-configuration \
  --function-name order-processor --memory-size 1024

aws lambda get-function-configuration \
  --function-name order-processor --query 'MemorySize'

# Terraform — pin the tuned memory as code
resource "aws_lambda_function" "order_processor" {
  function_name = "order-processor"
  role          = aws_iam_role.lambda.arn
  handler       = "handler.handler"
  runtime       = "nodejs20.x"
  memory_size   = 1024   # from Power Tuning, balanced strategy
  timeout       = 10
}

The common right-sizing mistakes, and the REPORT/Power-Tuning evidence that exposes each:

Mistake	What you see	Evidence	Fix
Stuck at 128 MB “to save money”	High `Duration`, same/worse cost	CPU-bound; duration drops sharply with memory	Move to the Power-Tuning winner
Over-allocated memory	`Max Memory Used` far below `Memory Size`	Logs Insights `peakMemMB`	Drop memory toward peak + headroom
Tuned with empty payload	“Optimum” disagrees with prod latency	p99 in prod ≠ tuner result	Re-tune with a representative event
One size for all functions	Some over-, some under-provisioned	Per-function `peakMemMB` spread	Tune each function independently

Provisioned concurrency: pre-warmed capacity

If your tuned function still cannot tolerate cold starts on the critical path — a synchronous API behind API Gateway, a checkout flow — provisioned concurrency (PC) keeps a pool of environments initialized and ready, so the init phase has already happened before traffic arrives. It is configured against a version or alias — never $LATEST, which forces a clean deploy-then-shift model.

# Publish an immutable version, then point PC at the alias
aws lambda publish-version --function-name order-processor

aws lambda update-alias \
  --function-name order-processor \
  --name live \
  --function-version 42

aws lambda put-provisioned-concurrency-config \
  --function-name order-processor \
  --qualifier live \
  --provisioned-concurrent-executions 20

Static provisioning wastes money outside peak. Drive it with Application Auto Scaling on a schedule or a utilization target so you pay for warmth only when you need it:

aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --resource-id function:order-processor:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --min-capacity 5 --max-capacity 100

aws application-autoscaling put-scaling-policy \
  --service-namespace lambda \
  --resource-id function:order-processor:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --policy-name pc-utilization \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 0.7,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization"
    }
  }'

Key facts to internalize: you pay for provisioned concurrency for the time it is enabled, whether or not it is invoked, plus a (reduced) per-request and duration charge when it is used. If demand exceeds your provisioned pool, the overflow spills to standard on-demand concurrency and those requests do cold-start. The ProvisionedConcurrencySpilloverInvocations metric is your floor-too-low alarm. Here is every PC configuration knob and how to reason about it:

Setting	What it controls	Values	When to change	Gotcha
`--qualifier`	Version/alias PC attaches to	a version number or alias	Always an alias for deploy-shift	`$LATEST` is rejected — by design
`--provisioned-concurrent-executions`	Size of the warm pool	1 … (≤ reserved/account)	Match the warm floor you need	Counts against account concurrency
AutoScaling `min-capacity`	Floor PC never drops below	≥ 0	Always-on readiness baseline	Too high = idle waste
AutoScaling `max-capacity`	Ceiling for scale-up	≤ account limit	Cap the spend / protect downstreams	Too low = spillover under spike
`TargetValue` (utilization)	Target PC utilization	0.0–1.0 (e.g. 0.7)	Tighter = more headroom, more cost	Too high = spill before scale reacts
Scheduled action	Time-based floor changes	cron/rate	Predictable diurnal peaks	Time-zone mistakes ramp at the wrong hour

The states a PC config moves through, and what each means for traffic — do not shift traffic until it is READY:

PC `Status`	Meaning	Safe to serve?	What to do
`IN_PROGRESS`	Environments still initializing	No (serves cold meanwhile)	Wait; don’t promote the alias yet
`READY`	Pool fully warm and allocated	Yes	Shift traffic to this alias
`FAILED`	Allocation failed	No	Check account concurrency / quota; retry

# Don't promote until READY — and confirm the allocated count
aws lambda get-provisioned-concurrency-config \
  --function-name order-processor --qualifier live \
  --query '{status:Status, allocated:AllocatedProvisionedConcurrentExecutions}'

The decision of how much PC to provision is a small table of trade-offs, not a guess:

If your traffic is…	Provision…	Driven by…	Why
Flat, predictable	A static floor near steady-state concurrency	Fixed PC	Simplicity; no spill
Diurnal (business hours)	Low floor + scheduled ramp	Scheduled AutoScaling	Pay for peak only when it exists
Spiky but gradual	Utilization target (0.7)	Target-tracking	Scales with demand; some lead time
Flash-spike (seconds)	Higher static floor for the window	Pre-scaled scheduled action	Target-tracking can’t react in seconds

SnapStart: snapshot-restore instead of re-init

SnapStart attacks cold starts from a different angle. Instead of keeping environments warm and paying for idle capacity, Lambda runs your init once at publish time, takes a Firecracker microVM snapshot of the initialized memory and disk, and restores from that snapshot on cold start instead of re-running init. It carries no provisioned-concurrency idle cost. SnapStart began on Java and AWS has extended it to Python and .NET runtimes; always confirm the runtimes available in your account’s region before committing.

# AWS SAM — enable SnapStart on a Java function
OrderProcessor:
  Type: AWS::Serverless::Function
  Properties:
    Runtime: java21
    Handler: com.example.Handler::handleRequest
    MemorySize: 1024
    SnapStart:
      ApplyOn: PublishedVersions
    AutoPublishAlias: live

SnapStart eligibility and behaviour differ by runtime family, and the wrong assumption here wastes a sprint — confirm your runtime and region before designing around it:

Runtime family	SnapStart support	Priming hook needed?	Typical cold-start win	Note
Java (Corretto 11/17/21)	Yes (original target)	Yes — JIT/class-load priming pays off most	Large (multi-second init → sub-second restore)	CRaC `org.crac` hooks; biggest beneficiary
Python (3.12+)	Yes (region-dependent)	Rarely — interpreted, less JIT	Moderate (import-heavy init)	Use lifecycle hooks for re-seed/refresh
.NET (8+)	Yes (region-dependent)	Sometimes — JIT/tiered-compilation	Moderate–large	Confirm regional availability
Node.js	No	n/a	Use provisioned concurrency instead	No snapshot model for Node
Go / Rust (custom runtime)	No	n/a	Fast init already; PC if needed	Native binaries cold-start fast
Container image functions	No (SnapStart is zip-based)	n/a	Trim image; PC for warmth	SnapStart doesn’t apply to images

The caveats are real and you must design for them. They fall into three classes, each with a concrete failure mode:

Caveat class	What goes wrong	Why	Where you fix it
Uniqueness	Same UUID/seed/timestamp across every restored env	Generated once at snapshot, then cloned	`afterRestore` — regenerate per-env values
Stale state	Dead DB connections, expired tokens/creds	Captured live at snapshot, expire by restore	`afterRestore` — re-establish/refresh
Priming	First real request still slow (JIT/lazy-load)	Restore is fast but JVM defers compilation	`beforeCheckpoint` — exercise hot paths

Anything generated once during init and captured in the snapshot — a random seed, a UUID, a cached timestamp — is now identical across every restored environment. Re-seed SecureRandom and regenerate per-invocation values after restore, not at class load. The AWS Cryptography libraries handle this for you; hand-rolled randomness does not. Network connections, credentials, and ephemeral tokens captured in the snapshot may be dead or expired on restore — re-establish them in a runtime hook. And while restore is fast, the JVM may still JIT-compile and lazy-load on the first real request, so use the beforeCheckpoint hook to prime hot paths (dummy invocations of your serialization, an SDK call) so that work is captured in the snapshot.

import org.crac.Core;
import org.crac.Resource;

public class Handler implements Resource {
  public Handler() {
    Core.getGlobalContext().register(this);
  }

  @Override
  public void beforeCheckpoint(org.crac.Context<? extends Resource> c) {
    // Prime: exercise hot paths so JIT/class-load is captured in the snapshot
    warmSerializers();
    warmSdkClients();
  }

  @Override
  public void afterRestore(org.crac.Context<? extends Resource> c) {
    // Re-establish anything that must be fresh per environment
    reSeedSecureRandom();
    refreshDbCredentials();
  }
}

The CRaC lifecycle hooks, and exactly what belongs in each — putting work in the wrong hook is the most common SnapStart bug:

Hook	Runs…	Put here	Never put here
`beforeCheckpoint`	Once, at publish (pre-snapshot)	Priming: serializers, SDK warm-up, class-load	Anything that must be unique per env
`afterRestore`	On every restore (cold start)	Re-seed randomness, refresh creds, reconnect	Heavy one-time work (defeats the purpose)

SnapStart vs provisioned concurrency is a real decision, not a default. SnapStart removes most of the init cold start with no idle charge but does nothing for sub-millisecond consistency and adds restore + priming complexity; provisioned concurrency gives the flattest tail latency but you pay for warm capacity continuously. Many teams run SnapStart by default and reserve PC for the few endpoints with the strictest p99. Side by side:

Dimension	Provisioned Concurrency	SnapStart
Cold start removed?	Yes (fully, within the pool)	Mostly (restore replaces init)
Idle cost	Yes — pay while enabled	No — pay only per invocation
Runtime support	All runtimes	Java, Python, .NET (region-dependent)
Tail-latency consistency	Flattest (no restore variance)	Restore + JIT priming variance
Code changes required	None	CRaC hooks (re-seed, refresh, prime)
Spillover behaviour	Overflow cold-starts on-demand	Each cold start restores (still fast)
Best for	Strict p99, any runtime	Cost-sensitive JVM/.NET/Python cold starts
Configured on	Version/alias	Published versions

The combined pattern many teams settle on, expressed as a decision table:

Constraint	Reach for	Why
Strict p99 on a Node/Go function	Provisioned concurrency	No SnapStart for Node/Go; PC flattens the tail
JVM cold starts dominate, cost matters	SnapStart + priming	Removes class-load/JIT tax for free
JVM with both cost pressure and strict p99	SnapStart for the floor + PC for the peak	Free init removal + bought tail flatness
Python with heavy imports, cost-sensitive	SnapStart	Snapshots the import-heavy init

Connection management and reuse across invocations

The most common self-inflicted latency bug: opening a database connection, HTTP client, or secret fetch inside the handler. That work then runs on every warm invocation. Move it to module/static scope so it is created once during init and reused across invocations on the same environment.

import os
import boto3
import psycopg2

# INIT SCOPE: runs once per environment, reused by every warm invocation
_secrets = boto3.client("secretsmanager")
_conn = None

def _get_conn():
    global _conn
    if _conn is None or _conn.closed:
        _conn = psycopg2.connect(host=os.environ["DB_HOST"], connect_timeout=3)
    return _conn

def handler(event, context):
    cur = _get_conn().cursor()           # reuse the connection
    cur.execute("SELECT 1")
    return {"ok": cur.fetchone()[0]}

For Node, set AWS_NODEJS_CONNECTION_REUSE_ENABLED=1 so the SDK reuses keep-alive TCP connections (the default in SDK v3, but harmless to set explicitly). What belongs in init scope versus the handler is a clean rule you can audit code against:

Work	Where it belongs	Why	Cost of getting it wrong
SDK / service clients	Init scope	Constructed once, thread-safe, reusable	Per-invocation construction latency
DB connection / pool	Init scope (lazy-guarded)	Reuse the TCP/auth handshake	Connection blowup; handshake per call
Static config / secrets	Init scope (cached)	Fetch once, reuse	Repeated SSM/Secrets calls, throttling
HTTP keep-alive client	Init scope	Reuse the connection pool	New TCP per call; SNAT/port pressure
Per-request state	Handler	Must be fresh each invocation	Cross-request data bleed (a real bug)
Per-request randomness / timestamps	Handler	Must differ per call	Duplicate IDs (worse under SnapStart)

The deeper problem at scale is connection-count blowup: 500 concurrent Lambda environments each holding a Postgres connection will exhaust max_connections on a db.r6g.large. Amazon RDS Proxy solves this by pooling and multiplexing connections on Lambda’s behalf, and it lets functions fetch DB credentials via IAM instead of embedding secrets. (The full operational treatment is in RDS Proxy in Production: Connection Pooling, Failover Acceleration, and IAM Authentication.)

aws rds create-db-proxy \
  --db-proxy-name app-proxy \
  --engine-family POSTGRESQL \
  --auth '[{"AuthScheme":"SECRETS","SecretArn":"arn:aws:secretsmanager:us-east-1:111122223333:secret:db-creds","IAMAuth":"REQUIRED"}]' \
  --role-arn arn:aws:iam::111122223333:role/rds-proxy-role \
  --vpc-subnet-ids subnet-0a1b2c subnet-0d4e5f

Point the function’s DB_HOST at the proxy endpoint, attach the function to the same VPC subnets, and let the proxy absorb the connection churn. This is non-negotiable above a few hundred concurrent executions against a relational database. The choices for taming connections, and what each buys:

Approach	What it does	Effort	When it’s enough	Limit / watch-out
Init-scope reuse (code)	One connection per env, reused	Code change	Low/moderate concurrency	Still 1 conn × concurrency at the DB
`AWS_NODEJS_CONNECTION_REUSE_ENABLED=1`	SDK keep-alive reuse	One env var	Node SDK HTTP reuse	SDK v3 already does it
RDS Proxy	Pools + multiplexes; IAM auth	Proxy + IAM + subnets	High concurrency on RDS/Aurora	Small hourly cost; VPC plumbing
DynamoDB instead of RDS	No connection model at all	Re-architecture	Key-value access patterns	Different data model
Reserved concurrency cap	Bounds connections from this fn	One setting	Protecting a fragile DB	Throttles past the cap

A worked sizing example: a relational instance has a finite max_connections (a few hundred on mid-size classes). With one connection per environment and 500 concurrent environments, you need 500 connections — past the ceiling, and the database starts refusing connects, which surfaces in Lambda as dependency timeouts, not as an obvious “too many connections” on the Lambda side. RDS Proxy multiplexes those 500 environments onto a far smaller pool of actual backend connections.

Concurrency controls: reserved, throttles, and quota planning

Concurrency is the number of in-flight executions. Your account has a regional concurrency limit (1,000 by default, raisable via a Service Quotas request). Two controls shape how that pool is shared:

Reserved concurrency caps a function at a maximum and guarantees that floor for it, carving it out of the shared pool. Use it to (a) protect a downstream like a database from being overwhelmed and (b) stop one noisy function from starving the rest of the account.
Provisioned concurrency (above) is a subset of reserved that is also pre-warmed.

# Cap order-processor at 200 concurrent executions
aws lambda put-function-concurrency \
  --function-name order-processor \
  --reserved-concurrent-executions 200

The three concurrency concepts confuse everyone at first; side by side they are clear:

Concept	What it is	Guarantees a floor?	Caps a max?	Pre-warmed?	Charged when idle?
Account/regional limit	Total in-flight for the region	No (shared)	Yes (account-wide)	No	No
Reserved concurrency	A function’s carved-out slice	Yes	Yes	No	No
Provisioned concurrency	Pre-initialized subset of reserved	Yes (warm)	Yes	Yes	Yes
Unreserved pool	What’s left after reservations	No	Implicitly	No	No

When a function hits its reserved limit (or the account hits the regional limit), Lambda throttles — and how the throttle surfaces depends entirely on the invocation type. This is the table to keep open during a spike incident:

Invocation type	Examples	On throttle	Retries?	What the caller sees
Synchronous	API Gateway, ALB, direct invoke	Rejected immediately	No (caller must)	`429 TooManyRequestsException`
Asynchronous	S3, SNS, EventBridge	Lambda retries with backoff	Yes (up to ~6 h, then DLQ)	Delayed processing; DLQ on exhaustion
Event source mapping	SQS, Kinesis, DynamoDB Streams	Batch retried per source rules	Yes (source-dependent)	Backlog grows; iterator age climbs

There is also a burst concurrency ceiling that governs how fast you can scale from cold — you get an initial burst, then a slower per-minute ramp toward your account limit. A flash spike can outrun the ramp even when you are nowhere near the account limit. The signals that tell you which wall you hit, and the fix:

Signal	Metric / where	Means	Fix
`Throttles` climbing, `ConcurrentExecutions` at account limit	CloudWatch Lambda metrics	Account/regional cap hit	Service Quotas increase; reserve to partition
`Throttles` on one function only	Per-function `Throttles`	That function’s reserved cap hit	Raise its reserved; or it’s protecting a downstream, leave it
`ProvisionedConcurrencySpilloverInvocations` > 0	Per-alias metric	Demand exceeded the PC pool	Raise PC floor / AutoScaling max
Throttles during a fast ramp, below account limit	`ConcurrentExecutions` slope	Burst ceiling outrun	Pre-scale PC for the window; smooth the spike

The hard limits that shape every concurrency and performance decision — the real numbers, what they cap, and whether you can raise them:

Limit	Default value	What it caps	Raisable?	Hit it and you get…
Account concurrency (per region)	1,000	Total in-flight executions	Yes (Service Quotas)	Account-wide throttles (429)
Memory per function	128 MB – 10,240 MB	RAM (and CPU share)	No (it’s the range)	Can’t exceed 10,240 MB
Function timeout	3 s default, 900 s max	Max single-invocation duration	No (max is 900 s)	Invocation killed at the cap
API Gateway integration timeout	29 s	Sync request behind API GW	No	`504`/timeout to the client
Deployment package (zip, direct)	50 MB zipped	Upload size via API	Use S3 / image instead	Upload rejected
Deployment package (unzipped)	250 MB	Code + layers unzipped	No	Deploy rejected
Layers per function	5	Attached layers	No	Can’t add a 6th
`/tmp` ephemeral storage	512 MB – 10,240 MB	Scratch disk	Configurable in range	Disk-full errors
Environment variables size	4 KB total	Env var payload	No	Config rejected
Provisioned concurrency	≤ reserved/account	Pre-warmed pool size	Via the above	Can’t exceed the slice
Burst concurrency	Initial burst + per-min ramp	Scale-from-cold rate	No (managed)	Throttles during sharp ramps
Invocation payload (sync)	6 MB	Request/response body	No	`RequestEntityTooLarge`
Invocation payload (async)	256 KB	Event body	No	Event rejected
Function + layer storage (account)	75 GB (default)	Total code storage	Yes (Service Quotas)	`CodeStorageExceeded` on deploy
Concurrent executions per PC config	= allocated PC	Warm pool ceiling	Via account/reserved	Spillover to on-demand (cold)

Plan for it: set reserved concurrency on the function fronting your most fragile dependency, alarm on the Throttles metric, and request a regional quota increase before a launch, not during the incident.

# Terraform — reserve concurrency to partition + protect a downstream
resource "aws_lambda_function" "order_processor" {
  function_name                  = "order-processor"
  role                           = aws_iam_role.lambda.arn
  handler                        = "handler.handler"
  runtime                        = "nodejs20.x"
  memory_size                    = 1024
  reserved_concurrent_executions = 200   # cap + guarantee
}

Observability: see the cold starts you are paying for

You cannot tune what you cannot measure. Three layers, and each answers a different question. First, the layer map:

Layer	What it gives you	Setup	Best for
CloudWatch Logs Insights	Query `REPORT` fields at scale	None (logs exist)	Cold-start %, init cost, p99, peak memory
CloudWatch metrics	`Throttles`, `ConcurrentExecutions`, spillover	None (emitted)	Alarms; spike/throttle/spillover signals
Lambda Insights	Per-function CPU/mem/network/init	One layer + policy	Resource view without parsing logs
AWS X-Ray	Per-request segment breakdown	Active tracing flag	“Is the latency mine or a dependency’s?”

CloudWatch Logs Insights — quantify cold-start frequency and init cost straight from the REPORT lines:

filter @type = "REPORT"
| fields @initDuration, @duration, @billedDuration, @maxMemoryUsed / 1000000 as memUsedMB
| stats count(*) as invocations,
        count(@initDuration) as coldStarts,
        avg(@initDuration) as avgInitMs,
        pct(@duration, 99) as p99DurationMs,
        max(memUsedMB) as peakMemMB

If peakMemMB sits far below your configured memory, you over-allocated; if coldStarts / invocations is high on a latency-sensitive function, that is your provisioned-concurrency / SnapStart signal. The questions you will ask in an incident, each as a one-line query target:

Question	Field(s)	Read it as
How often do we cold-start?	`count(@initDuration) / count(*)`	High on a sync API → buy warmth
How expensive is init?	`avg(@initDuration)`	Drives the cold-start tail
Is p99 inside SLO?	`pct(@duration, 99)`	The number the SLA is written against
Did we over-allocate memory?	`max(@maxMemoryUsed)` vs `Memory Size`	Far below → drop memory
Are we paying for rounding?	`@billedDuration` vs `@duration`	Sub-1 ms rounds up

Lambda Insights — a managed CloudWatch layer that surfaces CPU, memory, network, and init metrics per function with one config flag:

OrderProcessor:
  Type: AWS::Serverless::Function
  Properties:
    Policies:
      - CloudWatchLambdaInsightsExecutionRolePolicy
    Layers:
      - !Sub "arn:aws:lambda:${AWS::Region}:580247275435:layer:LambdaInsightsExtension:53"

AWS X-Ray — turn on active tracing to break a request into segments. The init subsegment shows cold-start cost, and downstream segments (DynamoDB, RDS, an HTTP call) reveal whether your latency is actually in your code or in a dependency you mistuned:

aws lambda update-function-configuration \
  --function-name order-processor \
  --tracing-config Mode=Active

The CloudWatch metrics worth an alarm, with starting thresholds — these are the leading indicators, not the lagging “errors spiked”:

Metric	What it signals	Starting threshold	Why it’s leading
`Throttles`	Concurrency limit hit	> 0 sustained 5 min	Requests being rejected/queued now
`ProvisionedConcurrencySpilloverInvocations`	PC floor too low	> 0 sustained	Cold starts leaking past the pool
`ConcurrentExecutions`	Approaching account/reserved cap	> 80% of limit	Predicts throttles before they bite
`Duration` p99	Warm-path latency creeping	> your SLO	Tail drifting toward a timeout
`Errors`	Function failures	> 1% of invocations	Confirmation; pair with the cause
`ProvisionedConcurrencyUtilization`	PC right-sized?	sustained < 0.3 or > 0.9	Over- or under-provisioned warmth
`IteratorAge` (stream/queue sources)	Event-source backlog growing	> your freshness target	Throttles/slow consumers fall behind
`DeadLetterErrors`	Async failures not reaching DLQ	> 0	Lost events; DLQ misconfigured
`ClaimedAccountConcurrency`	Account headroom consumed	> 80% of limit	Region nearing the global cap

Cost vs latency: a decision framework

There is no universal “fastest” setting — there is the cheapest setting that meets your latency SLO. Walk it in this order; each row is symptom → first lever → then consider:

Symptom	First lever	Then consider
Function feels slow, no SLO pressure	Power Tuning (right-size memory)	Trim package / init code
High p99 on a synchronous API	Power Tuning, then PC on the alias	SnapStart if JVM/Python/.NET
JVM cold starts dominate, cost-sensitive	SnapStart with priming hooks	PC for the few strict-p99 paths
DB connection errors at scale	Init-scope reuse + RDS Proxy	Reserved concurrency on the DB-facing fn
Throttles under spike	Request regional quota increase	Reserved concurrency to protect/partition
Spillover cold starts past the PC pool	Raise PC floor / AutoScaling max	Pre-scale for known flash windows
Sub-millisecond tail consistency required	Provisioned concurrency	(SnapStart alone won’t flatten it)

The guiding principle: tune memory before you buy warmth. Right-sizing is free and often cuts both latency and cost; provisioned concurrency and SnapStart are how you buy down the cold start that remains, and they trade money or complexity for tail latency. Spend that money only on the paths whose SLO actually requires it. The full lever menu, ranked by what it costs and what it fixes:

Lever	What it fixes	Cost	Effort	Trade-off
Right-size memory	Slow warm path; wasted GB-s	Free (often saves)	Low (Power Tuning)	Must tune with real payload
Trim package / init	Oversized init phase	Free	Medium	Build discipline
Init-scope reuse	Per-invocation connection cost	Free	Low (code move)	None — pure win
SnapStart	Init cold start (JVM/Py/.NET)	Free (per-invoke only)	Medium (CRaC hooks)	Snapshot caveats
Provisioned concurrency	Cold start on any runtime	Idle charge while enabled	Low	Pay for warmth
RDS Proxy	Connection blowup at scale	Small hourly	Medium (VPC/IAM)	Plumbing
Reserved concurrency	Throttle blast radius	Free	Low	Caps the function
Quota increase	Account-limit throttles	Free (request)	Low (lead time)	Must ask ahead

Architecture at a glance

The diagram traces a single synchronous invocation across four zones, left to right, and marks the five places latency or throttles actually bite. A client calls API Gateway (the sync entry, with its 29-second ceiling), which invokes the function. The function lands in the warmth tier, where one of three things serves it: a provisioned-concurrency environment that is already initialized and ready (the flat-tail path), a SnapStart environment that restores from a snapshot instead of re-running init, or — when demand exceeds the warm pool — an on-demand environment that pays a full cold start (the spillover path, marked in red). Inside any of those, well-written code reuses init-scope connections into the RDS Proxy (which pools and multiplexes onto the database) and DynamoDB (keep-alive reuse), rather than opening a connection per call.

The fourth zone is the control and observability loop that makes the rest work. Application Auto Scaling watches provisioned-concurrency utilization (target 0.70) and raises or lowers the warm pool, closing the loop back into the warmth tier — that is the arrow from control back to the function. CloudWatch and X-Ray capture Init Duration, Throttles, spillover and per-segment latency, which is how you prove a lever worked rather than assuming it did. Follow the five numbered badges in order — spillover cold start, SnapStart uniqueness/stale state, connection blowup, PC-not-ready/on-$LATEST, and the throttle wall — and the legend narrates each as symptom, the metric that confirms it, and the fix.

Real-world scenario

Solvent Pay, a payments platform, ran a synchronous “authorize transaction” Lambda (Java 17, Spring) behind API Gateway in ap-south-1. p50 was a healthy 40 ms, but p99 spiked to 6+ seconds whenever traffic stepped up — classic JVM cold starts as new environments spun to meet demand. The constraint was hard: a contractual p99 < 800 ms with the card network, and a finance mandate to cut Lambda spend that had ballooned after a previous engineer “fixed” an earlier latency issue by setting provisioned concurrency to a flat 300 around the clock — paying for 300 warm JVMs at 3 AM for a daytime workload. Monthly Lambda spend on that one function had crossed ₹2.4 lakh.

The team reworked it in three moves, in the right order. First, Power Tuning (run with a real authorization payload, not an empty event) showed the function was CPU-bound; moving from 1024 MB to 1769 MB cut warm duration by ~45% at roughly neutral cost — fewer GB-seconds per call offsetting the higher memory. That was the free win, taken first. Second, they enabled SnapStart with a beforeCheckpoint hook that primed the Spring context, the Jackson serializers, and the SDK clients, and an afterRestore hook that re-seeded SecureRandom and refreshed the database credentials. This removed the multi-second class-load/JIT penalty from cold starts entirely, at zero idle cost — and the afterRestore work was not optional: an early test build skipped the re-seed and two restored environments generated the same idempotency key, which their integration suite caught before it reached production.

Third, they replaced the flat 300 provisioned concurrency with Application Auto Scaling: a small floor (10) for always-on readiness, a target-tracking policy at 0.70 utilization, and a scheduled action that pre-ramped to 150 fifteen minutes before the daily 18:00 peak (because target-tracking alone reacts too slowly for a sharp diurnal step).

# The combination that hit the SLO: SnapStart for the floor, scheduled PC for the peak
AuthorizeTxn:
  Type: AWS::Serverless::Function
  Properties:
    Runtime: java17
    MemorySize: 1769
    SnapStart:
      ApplyOn: PublishedVersions
    AutoPublishAlias: live
    ProvisionedConcurrencyConfig:
      ProvisionedConcurrentExecutions: 10   # baseline; Application Auto Scaling ramps to 150 on schedule

The validation was disciplined: they ran the Logs Insights query over a window after each change and watched avgInitMs collapse toward the restore floor, p99DurationMs settle, ProvisionedConcurrencySpilloverInvocations go to zero during the ramped peak, and peakMemMB justify the 1769 MB. Result: p99 settled under 500 ms even during step-ups, and the provisioned-concurrency bill dropped roughly 70% versus the flat-300 configuration — to about ₹72,000/month. The lesson the team wrote into their runbook: “SnapStart removes the init tax for free; provisioned concurrency is for the peak tail you still cannot tolerate — and you scale it, you do not nail it to the floor. And never ship SnapStart without testing the afterRestore path, or you will clone an idempotency key into production.”

The incident-to-fix sequence, because the order of moves is the lesson:

Phase	State	Action	Effect
Baseline	p99 6+ s on step-up, ₹2.4 L/mo	(flat PC 300, 1024 MB)	Slow tail and overspending
Move 1	CPU-bound at 1024 MB	Power Tuning → 1769 MB	Warm duration −45%, ~neutral cost
Move 2 (bug)	SnapStart on, no re-seed	Skipped `afterRestore`	Duplicate idempotency key in test
Move 2 (fixed)	SnapStart + CRaC hooks	Prime + re-seed + refresh creds	Init tax gone, zero idle cost
Move 3	Flat 300 PC wasteful	AutoScaling floor 10 + scheduled 150	Pay for peak only when it exists
Result	p99 < 500 ms, ₹72 k/mo	Validated via Logs Insights	SLO met, bill −70%

Advantages and disadvantages

The buy-or-build-warmth model — right-size free, then choose PC or SnapStart — gives you precise control, but each lever has a cost shape you must weigh honestly:

Advantages (why this model helps)	Disadvantages (why it bites)
Memory tuning improves latency and cost at once — the only free lever	“Memory is CPU” is unintuitive; teams set 128 MB and get slow, equal-cost functions
Provisioned concurrency gives a flat, predictable tail for any runtime	You pay for PC whether or not it’s invoked — flat 24x7 PC is a classic overspend
SnapStart removes the init tax with zero idle cost	Snapshot caveats are subtle: cloned UUIDs and dead connections cause real, hard-to-spot bugs
Reserved concurrency partitions the account and protects fragile downstreams	Set too low it throttles legitimate traffic; the cap is also a ceiling
RDS Proxy makes relational DBs survive high concurrency	Adds VPC/IAM plumbing and a small hourly cost; another hop to operate
Every lever is measurable (`REPORT`, metrics, X-Ray)	Tuning blind is easy if you skip instrumentation; the optimum is payload-specific
Application Auto Scaling makes warmth follow demand	Target-tracking reacts in minutes, not seconds — flash spikes need pre-scaling

The model is right for any latency- or cost-sensitive serverless workload where you are willing to measure first and tune deliberately. It bites teams that reach for warmth before right-sizing (overspending on a problem free tuning would have solved), that enable SnapStart without testing the restore path (cloning state into production), or that nail PC to a flat number (paying for 3 AM warmth). Every disadvantage is manageable — but only if you know it exists, which is the entire point of measuring before you tune.

Hands-on lab

Right-size a function with Power Tuning, then add provisioned concurrency on an alias and prove it’s READY — all from the CLI. Free-tier-friendly (a few short invocations; we delete PC at the end so there’s no lingering idle charge). Run in CloudShell or any shell with the aws CLI configured.

Step 1 — Variables.

REGION=ap-south-1
FN=lab-cold-start-$RANDOM
ROLE_ARN=$(aws iam get-role --role-name lambda-basic-exec --query 'Role.Arn' --output text)

Step 2 — Create a tiny CPU-bound function (Node) at 128 MB to reproduce the problem.

cat > handler.js <<'EOF'
// Init scope: runs once per environment
const start = Date.now();
exports.handler = async () => {
  // A little CPU work so memory-as-CPU is visible
  let x = 0;
  for (let i = 0; i < 5_000_000; i++) x += Math.sqrt(i);
  return { ok: true, sinceInitMs: Date.now() - start, x };
};
EOF
zip function.zip handler.js

aws lambda create-function --function-name $FN --runtime nodejs20.x \
  --handler handler.handler --zip-file fileb://function.zip \
  --role "$ROLE_ARN" --memory-size 128 --timeout 10 --region $REGION

Step 3 — Invoke a few times and read the REPORT line. The first call is cold (Init Duration present); note Duration at 128 MB.

for i in 1 2 3; do
  aws lambda invoke --function-name $FN --region $REGION /dev/null \
    --log-type Tail --query 'LogResult' --output text | base64 -d | grep REPORT
done

Expected: a REPORT line on the first call with Init Duration: ...; subsequent calls warm. The CPU loop’s Duration will be high at 128 MB.

Step 4 — Raise memory to 1024 MB and re-measure. Same code, more CPU.

aws lambda update-function-configuration --function-name $FN \
  --memory-size 1024 --region $REGION
aws lambda invoke --function-name $FN --region $REGION /dev/null \
  --log-type Tail --query 'LogResult' --output text | base64 -d | grep REPORT

Expected: Duration drops sharply versus 128 MB — that’s CPU scaling with memory. Note Max Memory Used is far below 1024, so for a non-CPU-bound function you’d drop back down; here the CPU work justifies the memory.

Step 5 — Publish a version, point an alias at it, and add provisioned concurrency.

VER=$(aws lambda publish-version --function-name $FN --region $REGION --query Version --output text)
aws lambda create-alias --function-name $FN --name live \
  --function-version "$VER" --region $REGION
aws lambda put-provisioned-concurrency-config --function-name $FN \
  --qualifier live --provisioned-concurrent-executions 2 --region $REGION

Step 6 — Wait for PC to be READY, then confirm no cold start on the alias.

# Poll until READY (don't serve traffic before this)
aws lambda get-provisioned-concurrency-config --function-name $FN \
  --qualifier live --region $REGION \
  --query '{status:Status, allocated:AllocatedProvisionedConcurrentExecutions}'

# Invoke the ALIAS — REPORT should have NO "Init Duration"
aws lambda invoke --function-name $FN:live --region $REGION /dev/null \
  --log-type Tail --query 'LogResult' --output text | base64 -d | grep REPORT

Expected: Status: READY, allocated: 2; the alias invocation’s REPORT has no Init Duration — the cold start is gone because the environment was pre-warmed.

The lab steps mapped to what each proves:

Step	What you did	What it proves	Real-world analogue
3	Invoke at 128 MB, read `REPORT`	`Init Duration` only on cold starts; CPU is throttled	The “Lambda is slow” complaint
4	Raise to 1024 MB, re-measure	Memory is CPU — duration drops	The free right-sizing win
5	Version + alias + PC	PC must target a version/alias, never `$LATEST`	The deploy-then-shift model
6	Wait for `READY`, invoke alias	A warm pool removes the cold start	Buying down the tail on a sync API

Cleanup (remove the idle PC charge first, then the function).

aws lambda delete-provisioned-concurrency-config --function-name $FN \
  --qualifier live --region $REGION
aws lambda delete-function --function-name $FN --region $REGION

Cost note. A handful of sub-second invocations is effectively free under the Lambda free tier; the only chargeable item is provisioned concurrency, which bills for the time it’s enabled — deleting it in cleanup (before the function) stops that immediately. The whole lab runs to a few rupees at most.

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First, the error/exception reference: the specific strings and codes you’ll see, what they mean on Lambda, and the fix — scan this before the symptom table.

Error / code	Where it surfaces	What it means	Likely cause	Fix
`429 TooManyRequestsException`	Sync caller (API GW, invoke)	Concurrency limit hit	Reserved/account cap or burst ceiling	Raise reserved; quota increase; pre-scale PC
`Rate Exceeded` / `ThrottlingException`	Control-plane calls	API throttled	Too many config/list calls	Back off; batch; cache
`ProvisionedConcurrencyConfigNotFoundException`	`get-provisioned-concurrency-config`	No PC on that qualifier	PC not set, or wrong alias/version	Attach PC to the alias traffic uses
`InvalidParameterValueException` (`$LATEST`)	`put-provisioned-concurrency-config`	PC rejected on `$LATEST`	Targeted `$LATEST`	Publish a version; target an alias
`Status: FAILED` (PC)	PC config status	Allocation failed	Not enough account concurrency	Raise account limit; lower other PC
`Task timed out after N seconds`	Function logs	Hit the configured timeout	Slow code/dependency; timeout too low	Speed up; raise timeout (≤900 s)
`Endpoint request timed out` (504)	API Gateway	Function exceeded 29 s	Long sync work behind API GW	Make async; shorten; Step Functions
`Runtime exited ... signal: killed`	Function logs	OOM — exceeded memory	Memory too low for the workload	Raise memory; fix leak
`connection ... too many connections`	DB / dependency logs	DB `max_connections` hit	Connection blowup at scale	Init-scope reuse; RDS Proxy
`ResourceConflictException`	Deploy / config update	Concurrent update in progress	Overlapping deploys/config calls	Serialize; retry after settle
`ENILimitReached` / slow VPC cold start	VPC-attached function	Hyperplane ENI pressure	Many subnets/functions, small subnet	Right-size subnet CIDR; consolidate
`Unable to import module 'handler'`	Init phase	Init crashed before handler	Missing dep, bad layer, import error	Fix dependency/layer; check `unzip -l`

Now the symptom → cause → confirm → fix table you read mid-incident, then the entries that bite hardest expanded with the exact commands.

#	Symptom	Root cause	Confirm (exact cmd / metric)	Fix
1	High p99 on a sync API; p50 fine	Cold starts on the critical path	Logs Insights: high `coldStarts/invocations`; X-Ray init subsegment	PC on the alias; SnapStart if eligible
2	Function “slow”, cost unchanged at 128 MB	CPU-starved at low memory	Power Tuning shows duration drops with memory	Raise to the Power-Tuning winner
3	Provisioned concurrency “not working”	PC on `$LATEST`, or not `READY`, or wrong qualifier	`get-provisioned-concurrency-config` Status ≠ READY	Target a version/alias; wait for READY
4	p99 still spikes despite PC	Demand exceeds the PC pool (spillover)	`ProvisionedConcurrencySpilloverInvocations` > 0	Raise PC floor / AutoScaling max; pre-scale
5	Duplicate IDs / “same value” across requests	SnapStart cloned a UUID/seed	IDs identical post-restore; only on SnapStart fns	Regenerate in `afterRestore`, not at init
6	Auth/connection failures right after deploy	SnapStart restored stale creds/connections	Failures cluster on cold (restored) envs	Refresh creds/reconnect in `afterRestore`
7	First request slow even with SnapStart	JVM JIT/lazy-load not primed	First post-restore call slow, then fast	Prime hot paths in `beforeCheckpoint`
8	DB dependency timeouts at scale, fine at rest	Connection blowup (per-invocation connects)	RDS connection count near `max_connections`	Init-scope reuse; front with RDS Proxy
9	`429 TooManyRequestsException` under load	Reserved/account concurrency limit hit	`Throttles` climbing; `ConcurrentExecutions` at cap	Raise reserved; Service Quotas increase
10	Throttles during a fast ramp, below account limit	Burst-concurrency ceiling outrun	`ConcurrentExecutions` slope caps then throttles	Pre-scale PC for the window; smooth spike
11	Async events delayed / landing in DLQ	Throttled async invocations retrying	`Throttles` on an async-triggered fn; DLQ depth	Raise concurrency; check the DLQ cause
12	Init phase takes seconds	Oversized package / eager init work	High `Init Duration`; large `unzip -l`	Trim deps, mark SDK external, lazy-init
13	Memory bill high, function fast	Over-allocated memory	`Max Memory Used` ≪ `Memory Size`	Drop memory toward peak + headroom
14	Costs jumped after a “latency fix”	Flat 24x7 provisioned concurrency	PC config shows a high static floor	AutoScaling: low floor + scheduled/target ramp

The expanded form for the entries that cause the most wasted hours:

1. High p99 on a synchronous API, p50 healthy. Root cause: cold starts on the critical path — new environments paying init as traffic steps up. Confirm: Logs Insights shows a high count(@initDuration)/count(*) ratio on that function; X-Ray’s init subsegment shows the time landing in init, not your handler. Fix: right-size memory first (free), then provisioned concurrency on the alias for a flat tail; SnapStart if the runtime is Java/Python/.NET and cost matters more than absolute consistency.

3. Provisioned concurrency “isn’t doing anything.” Root cause: PC was put on $LATEST (rejected) or on a different qualifier than the one traffic hits, or it isn’t READY yet and traffic was shifted early. Confirm: aws lambda get-provisioned-concurrency-config --qualifier <alias> returns Status: IN_PROGRESS/FAILED, or the alias your API points to has no PC config. Fix: attach PC to the version/alias your traffic actually uses; wait for READY before promoting; ensure the API stage points at the PC-backed alias.

5 & 6. SnapStart cloned state / restored stale connections. Root cause: values generated once at snapshot (UUIDs, SecureRandom seed, timestamps) are identical across every restore; connections/creds captured live are dead/expired on restore. Confirm: duplicate IDs appear only on SnapStart-enabled functions and cluster on cold (restored) environments; auth/connection failures cluster immediately post-deploy. Fix: regenerate per-environment values and re-seed SecureRandom in afterRestore; refresh credentials and re-establish connections there too. Never rely on init-time randomness under SnapStart.

8. Relational dependency times out under load, fine at rest. Root cause: the handler opens a connection per invocation, so at N concurrent environments you need N connections; past max_connections the database refuses connects, surfacing as Lambda-side dependency timeouts. Confirm: RDS/Aurora connection count climbs toward max_connections exactly as Lambda concurrency rises; the failures are connects, not query errors. Fix: move the connection to init scope and reuse it; for high concurrency front the database with RDS Proxy so it multiplexes many environments onto a small backend pool.

9 & 10. Throttles — but which wall? Root cause: either a reserved-concurrency cap on the function, the account/regional limit, or the burst ceiling during a sharp ramp. Confirm: Throttles rising with ConcurrentExecutions pinned at the account limit → account cap; Throttles on one function with others fine → its reserved cap; throttles while ConcurrentExecutions is still climbing and below the limit → burst ceiling. Fix: Service Quotas increase ahead of a launch for the account cap; raise (or accept, if it’s protecting a downstream) the reserved cap; pre-scale provisioned concurrency for a known flash window the burst ramp can’t keep up with.

14. The bill jumped after someone “fixed” latency. Root cause: provisioned concurrency nailed to a flat 24x7 number, paying for warmth at 3 AM for a daytime workload. Confirm: the PC config shows a high static provisioned-concurrent-executions with no Application Auto Scaling target/schedule attached. Fix: set a small always-on floor and let Application Auto Scaling raise it on a schedule (diurnal peaks) or a utilization target; reserve a higher static floor only for genuine flash windows.

Best practices

Right-size memory before you buy warmth. It’s the only free lever and often cuts latency and cost. Run Power Tuning with a representative payload, not an empty event.
Treat init scope as sacred. SDK clients, connections, and static config go outside the handler; only per-request state goes inside. This single discipline kills most self-inflicted latency.
Trim the package. Mark the bundled SDK external, prune dev deps, tree-shake. Init time tracks package size closely.
Put provisioned concurrency on an alias, never $LATEST, and drive it with Application Auto Scaling — a low floor plus a schedule/target, not a flat 24x7 number.
Wait for READY before shifting traffic to a PC-backed alias; serving while IN_PROGRESS still cold-starts.
For SnapStart, always implement afterRestore to re-seed randomness, regenerate per-env values, and refresh credentials/connections — and beforeCheckpoint to prime hot paths. Test the restore path; an un-re-seeded UUID is a production bug.
Reuse connections; front relational DBs with RDS Proxy above a few hundred concurrent executions, so you never exhaust max_connections.
Reserve concurrency on functions fronting fragile downstreams to partition the account and protect the dependency — and to stop one noisy function starving the rest.
Request a regional quota increase before a launch, not during the incident, and pre-scale PC for known flash windows the burst ceiling can’t ramp into.
Instrument from day one: active X-Ray tracing, Lambda Insights, and alarms on Throttles, ProvisionedConcurrencySpilloverInvocations, and p99. Tuning blind is guessing.
Verify the number moved after every change — run the Logs Insights query over an after-window and confirm avgInitMs, p99, peak memory and spillover went the right way. Don’t assume; measure.
Pick the cheapest config that meets the SLO, per path. Spend on warmth only where the tail-latency SLO actually requires it.

Security notes

Fetch DB credentials via IAM, not embedded secrets. RDS Proxy lets the function authenticate with IAM and pull the database secret from Secrets Manager, so no password sits in an environment variable. Pair with least-privilege on the proxy’s role. (See AWS Secrets Manager vs SSM Parameter Store, In Depth: Secrets, Rotation & Config.)
Least-privilege execution role. Grant only the actions the function uses (specific DynamoDB tables, the one Secrets Manager secret ARN, the KMS key). A broad * role is a blast-radius and a finding.
Re-seed cryptographic randomness under SnapStart. A SecureRandom captured in the snapshot is predictable and identical across restores — a genuine security defect, not just a duplicate-ID bug. Use the AWS Cryptography libraries or re-seed in afterRestore.
Don’t log secrets or full payloads. Logs Insights queries should target REPORT and metadata, not request bodies; scrub PII before it reaches CloudWatch.
Keep the function in a VPC only when it needs private resources (RDS Proxy, private endpoints). VPC-attached functions reach AWS APIs via interface endpoints or a NAT path — design egress deliberately rather than opening it wide.
Encrypt environment variables with a customer-managed KMS key when they hold anything sensitive, and rotate the underlying secrets (Secrets Manager rotation) rather than the env var.
Scope reserved concurrency as a safety control too — it bounds how many connections or downstream calls a compromised or runaway function can make.

The security controls that also improve resilience — they pull the same direction here:

Control	Mechanism	Secures against	Also prevents
IAM DB auth via RDS Proxy	Proxy `IAMAuth: REQUIRED`	Embedded DB passwords	Credential staleness under SnapStart
Least-privilege exec role	Scoped IAM policy	Over-broad blast radius	Accidental calls to wrong resources
Re-seed `SecureRandom` (afterRestore)	CRaC hook	Predictable/cloned randomness	Duplicate idempotency keys
Reserved concurrency cap	`reserved-concurrent-executions`	Runaway downstream abuse	DB connection exhaustion
KMS-encrypted env vars	Customer-managed key	Plaintext secret exposure	(audit/rotation hygiene)
Secrets Manager rotation	Automatic rotation	Long-lived static creds	Credential drift breaking the app

Cost & sizing

The bill drivers, how they interact with the fixes, and what to watch:

GB-seconds (memory × billed duration) is the core cost. Right-sizing can lower it even while raising memory, because the work finishes in fewer seconds — which is why “memory is CPU” is also a cost lever, not just a latency one.
Provisioned concurrency has two parts: a charge for the time it’s enabled (idle or not) plus a reduced per-request/duration charge when used. A flat 24x7 floor is the classic overspend; AutoScaling with a low floor and a schedule/target is far cheaper.
SnapStart adds no idle charge — you pay only per invocation (with a small per-restore element on some runtimes) — which is exactly why cost-sensitive JVM workloads favour it over PC.
Requests are billed per invocation; high-throughput functions accumulate request cost independent of duration.
RDS Proxy adds a small hourly per-vCPU charge on the proxied instance — trivial next to the cost of a database falling over from connection exhaustion during a sale.
CloudWatch Logs/metrics/X-Ray are billed per GB ingested / per trace — worth it, but use log retention and trace sampling on high-traffic functions so a flash sale doesn’t spike the telemetry bill.

A rough monthly picture for a single busy synchronous function, before vs after deliberate tuning:

Cost driver	What you pay for	Rough INR / month	What it fixes	Watch-out
GB-seconds (right-sized)	Memory × duration, tuned	varies with traffic	Slow warm path	Bills more if over-allocated
Provisioned concurrency (flat 300)	24x7 warm pool	~₹2.4 L (the anti-pattern)	Cold starts	Pays for 3 AM warmth
Provisioned concurrency (floor 10 + ramp)	Warm only at peak	~₹70–80 k	Cold starts, cost-aware	Pre-scale for flash spikes
SnapStart	Per-invoke restore only	~₹0 idle	JVM init tax	Snapshot caveats to code around
RDS Proxy	Hourly per-vCPU	~₹1.5–3 k	Connection blowup	Needs VPC/IAM
Requests	Per invocation	scales with traffic	(inherent)	High-throughput accumulates
Observability	Logs/metrics/traces per GB	~₹1–3 k	Tuning blind	Sample + set retention

The sizing rule in one line: find the cheapest memory that meets the warm-path latency, add only the warmth (PC or SnapStart) the cold-start SLO needs, and scale that warmth to demand. Solvent Pay landed at ~₹72,000/month after doing exactly this — down 70% from the flat-300 anti-pattern — proof that the fix is usually deliberate tuning, not a bigger anything.

Interview & exam questions

1. Why is a Lambda function at 128 MB not necessarily “cheap”? Lambda allocates CPU proportionally to memory, so a CPU-bound function at 128 MB runs on a sliver of a vCPU and takes far longer — and since billing is GB-seconds, the slower run can cost the same or more than a higher-memory run that finishes quickly, while delivering worse latency. Right-sizing with Power Tuning often lowers both latency and cost.

2. What are the three parts of a cold start, and which do you control most? Environment download/init (microVM provision, package/image pull, runtime start — mostly AWS), your init phase (imports, SDK clients, connections — the part you control and that’s billed), and the warm invoke (your handler). You have the most leverage over the init phase via package trimming and lazy initialization.

3. Provisioned concurrency vs SnapStart — when do you pick each? Provisioned concurrency pre-initializes a pool for the flattest tail on any runtime, but you pay for it whenever it’s enabled. SnapStart restores from a snapshot of init (Java/Python/.NET) with no idle cost but adds snapshot caveats and restore/priming variance. Pick PC for strict p99 or unsupported runtimes; pick SnapStart for cost-sensitive JVM/.NET/Python cold starts; combine them for JVM workloads with both pressures.

4. Why must provisioned concurrency target a version or alias, never $LATEST? $LATEST is mutable, so PC can’t guarantee a stable, pre-initialized snapshot of code/config against it — AWS rejects it. Targeting an immutable version (usually via an alias) enforces a deploy-then-shift model where you publish, warm, confirm READY, then move traffic.

5. What breaks if you don’t implement afterRestore under SnapStart? Anything generated once at snapshot — SecureRandom seeds, UUIDs, timestamps — is cloned identically across every restored environment, and captured connections/credentials may be dead or expired. You get duplicate IDs (e.g. idempotency keys), predictable randomness (a security flaw), and auth/connection failures on cold starts. afterRestore re-seeds and refreshes these per environment.

6. How does SnapStart’s beforeCheckpoint hook help latency? Restore is fast, but the JVM still JIT-compiles and lazy-loads on the first real request, so the first post-restore call can be slow. beforeCheckpoint runs at publish time and lets you prime hot paths (exercise serializers, make a dummy SDK call) so that compiled/loaded state is captured in the snapshot and the first real request is already fast.

7. What is SNAT-style connection blowup on Lambda, and how do you fix it? Each concurrent environment that opens its own database connection multiplies connections by concurrency; at a few hundred concurrent executions you exhaust the database’s max_connections and it refuses connects, surfacing as dependency timeouts. Fix by opening connections in init scope and reusing them, and by fronting the database with RDS Proxy, which multiplexes many environments onto a small backend pool.

8. Reserved vs provisioned concurrency? Reserved concurrency carves a guaranteed-and-capped slice of the account limit for a function (protecting downstreams and partitioning the account) but does not pre-warm anything. Provisioned concurrency is a subset of reserved that is also kept initialized and ready. Reserved bounds; provisioned bounds and warms.

9. How does a throttle surface differently for synchronous, asynchronous, and event-source invocations? Synchronous callers (API Gateway, direct invoke) get an immediate 429 TooManyRequestsException and must retry themselves. Asynchronous invocations (S3, SNS, EventBridge) are retried by Lambda with backoff and eventually go to a DLQ. Event source mappings (SQS, Kinesis, DynamoDB Streams) retry per the source’s rules, so the backlog and iterator age grow.

10. You enabled provisioned concurrency but p99 still spikes under load. Why and what do you check? Demand is exceeding the provisioned pool, so the overflow runs on-demand and cold-starts. Check ProvisionedConcurrencySpilloverInvocations — any sustained non-zero value means raise the PC floor or the Application Auto Scaling max, and pre-scale for known flash windows since target-tracking reacts too slowly for sharp spikes.

11. What’s the single fastest way to tell a cold start from a warm one in the logs? The REPORT line includes Init Duration only on cold starts. Filter on its presence (e.g. count(@initDuration) in Logs Insights) to measure cold-start frequency and cost without any extra instrumentation.

12. Why pre-scale provisioned concurrency for a flash sale instead of relying on target-tracking? Target-tracking Application Auto Scaling reacts over minutes, and even on-demand scale-out is bounded by the burst-concurrency ceiling, so a spike that arrives in seconds outruns both and cold-starts (or throttles). A scheduled action that ramps the PC floor before the known window keeps the pool warm ahead of the traffic.

These map to AWS Certified Developer – Associate (DVA-C02) — develop, deploy and troubleshoot serverless applications, Lambda configuration, concurrency, and observability — and AWS Certified Solutions Architect – Associate (SAA-C03) for the architecture trade-offs (PC vs SnapStart, RDS Proxy, API Gateway fronting). The performance-and-cost optimization angle touches AWS Certified DevOps Engineer – Professional (DOP-C02). A compact cert mapping:

Question theme	Primary cert	Objective area
Memory-as-CPU, right-sizing, GB-seconds	DVA-C02	Optimize serverless cost/performance
PC vs SnapStart, aliases/versions	DVA-C02	Deploy & configure Lambda
Concurrency model, throttles, quotas	DVA-C02 / SAA-C03	Resilient serverless design
RDS Proxy, connection reuse	SAA-C03	Design scalable data tiers
CloudWatch/X-Ray, Logs Insights	DVA-C02 / DOP-C02	Instrument & troubleshoot
AutoScaling PC, pre-scaling for spikes	DOP-C02	Automation & scaling

Quick check

A synchronous API’s p50 is 40 ms but p99 is 6 seconds under load. What’s the most likely cause, and the first (free) lever before you spend money?
True or false: scaling a function to more memory always costs more.
You put provisioned concurrency on a function but it still cold-starts. Name two things to check.
Under SnapStart, two environments returned the same idempotency key. What went wrong and where do you fix it?
Your relational database starts refusing connections exactly as Lambda concurrency climbs past a few hundred. What’s happening and what’s the fix?

Answers

Cold starts on the critical path as new environments spin up under load (p50 is the warm path, p99 the cold tail). The first lever is right-sizing memory with Power Tuning — it’s free and often cuts the warm duration too; only then do you buy warmth (provisioned concurrency on the alias, or SnapStart if the runtime supports it).
False. Memory is CPU, so more memory can make a CPU-bound function finish in far fewer GB-seconds — lowering the bill and the latency. You only overspend if you allocate memory the function doesn’t use (check Max Memory Used).
Check (a) that PC targets a version or alias your traffic actually hits, never $LATEST, and (b) that its Status is READY (not IN_PROGRESS/FAILED) before traffic was shifted. Also watch ProvisionedConcurrencySpilloverInvocations — non-zero means demand exceeds the pool.
SnapStart captured a value generated once at snapshot (a UUID/SecureRandom seed) and cloned it across every restored environment. Regenerate per-environment values and re-seed randomness in the afterRestore hook, not at init/class-load.
Connection blowup — each concurrent environment opened its own connection, exhausting the database’s max_connections. Fix by reusing the connection in init scope and fronting the database with RDS Proxy, which multiplexes many environments onto a small backend pool.

Glossary

Execution environment — the Firecracker microVM that runs one Lambda invocation at a time; cold when newly created, warm when reused.
Cold start — the latency of provisioning an environment and running the init phase before the first invocation; visible as Init Duration in the REPORT line.
Init phase — code outside the handler (imports, SDK clients, static config, connections) that runs once per environment and is billed; the cold-start cost you most control.
Warm invocation — a handler call on a reused environment that skips environment init and the init phase.
Memory-is-CPU — Lambda allocates CPU proportionally to configured memory; ~1,769 MB ≈ one full vCPU.
GB-second — the billing unit: configured memory (GB) × billed duration (seconds).
Provisioned concurrency (PC) — a pool of environments kept initialized and ready on a version/alias; removes cold starts but is charged while enabled.
SnapStart — runs init once at publish, snapshots the microVM, and restores from it on cold start (Java/Python/.NET); no idle charge, with snapshot caveats.
CRaC (beforeCheckpoint/afterRestore) — the runtime hooks SnapStart uses: prime hot paths before the snapshot; re-seed/refresh per-environment state after restore.
Reserved concurrency — a guaranteed-and-capped slice of the account concurrency limit for one function; partitions the account and protects downstreams.
Regional/account concurrency limit — total in-flight executions allowed in a region (default 1,000, raisable).
Burst concurrency — the ceiling on how fast you can scale from cold before the per-minute ramp toward the account limit.
Throttle — an invocation rejected at a concurrency limit; 429 for synchronous callers, retried-then-DLQ for async, source-dependent for event source mappings.
Spillover — invocations that exceed the provisioned-concurrency pool and run on-demand (cold); tracked by ProvisionedConcurrencySpilloverInvocations.
RDS Proxy — a managed connection pooler that multiplexes many Lambda environments onto a small set of backend database connections and supports IAM auth.
Init scope (module/static scope) — code outside the handler whose objects (clients, connections) persist across warm invocations on the same environment.
Lambda Power Tuning — an open-source Step Functions state machine that sweeps memory settings and plots cost vs speed to find the optimum.
REPORT line — the per-invocation CloudWatch Logs summary carrying Init Duration (cold only), Duration, Billed Duration, Max Memory Used, and Memory Size.

Next steps

You can now decompose Lambda latency, right-size for free, buy down the cold start that remains, and prove every change. Build outward:

Next: AWS Lambda, In Depth: Runtimes, Triggers, Layers, Concurrency & Every Setting — the full mechanics under every knob in this article.
Related: RDS Proxy in Production: Connection Pooling, Failover Acceleration, and IAM Authentication — go deep on the connection fix for high-concurrency relational access.
Related: Amazon API Gateway, In Depth: REST vs HTTP vs WebSocket APIs, Integrations & Authorizers — the synchronous entry path and its 29-second ceiling.
Related: Distributed Tracing on AWS with X-Ray: Service Maps, Segments, and ADOT on EKS — segment-level latency attribution to prove the cost is yours or a dependency’s.
Related: AWS Step Functions in Production: Express vs Standard, Distributed Map, and Resilient Error Handling — the orchestration engine behind Power Tuning, and where long-running work belongs instead of a Lambda.