AWS Lesson 27 of 123

DynamoDB Single-Table Design: Modeling Access Patterns, GSIs, and Hot Partition Avoidance

Relational instincts are the single biggest reason DynamoDB projects go sideways. You normalize entities into tables, then discover the only join DynamoDB offers is the one you pre-compute at write time. Single-table design is the discipline of collapsing many entity types into one physical table so the queries your application runs become single-partition reads. It is not about saving on table count — it is about co-locating related items so a Query returns a parent and its children in one round trip, at single-digit-millisecond latency, regardless of table size.

This is the process I follow on every greenfield DynamoDB design: enumerate access patterns first, design keys to satisfy them, overload indexes, then defend the schema against hot partitions and the 400 KB item ceiling. The order matters. Start from the entities and you will refactor; start from the queries and you will ship. By the end you will be able to take an access-pattern list, derive a key schema and a set of overloaded GSIs that serve every read in one call, and prove the model against real consumed-capacity numbers rather than intuition.

This article is a reference you will return to mid-design and mid-incident, so the moving parts — key components, GSI projection choices, capacity-mode trade-offs, error/limit codes, and a symptom→cause→confirm→fix playbook — are all laid out as scannable tables alongside the prose and the aws/Terraform/Python snippets. Read the prose once to build the mental model, then keep the tables open when you are actually modeling.

What problem this solves

The pain DynamoDB single-table design addresses is specific: a NoSQL store that punishes you for thinking relationally. In a relational database you model the entities, normalize, and let the query planner figure out joins at read time. DynamoDB has no query planner and no server-side join. The only “join” is placement — items you decided at write time to store under the same partition key. If you model entities into separate tables the relational way, every screen that shows a parent and its children becomes two or three round trips, each its own network hop and its own capacity charge, and the “list all X for a Y” screens degrade into Scan operations that read the entire table and throw most of it away.

What breaks without this discipline: latency that climbs with table size instead of staying flat, capacity bills dominated by Scan-and-filter waste, and throttling that looks like insufficient capacity but is actually a hot key. Teams “fix” the throttling by raising provisioned capacity, the bill doubles, and the hot partition still throttles because the ceiling is per-partition, not per-table. Others discover the 400 KB item limit the day a rollup array finally crosses it in production, with a ValidationException and no graceful degradation.

Who hits this: anyone building a serverless or high-scale application on DynamoDB — multi-tenant SaaS, order/inventory systems, event logs, social graphs, session stores. It bites hardest on time-series workloads (the date-as-partition-key trap), mega-aggregate items (an order with thousands of line items, a shipment with thousands of scans), and any service whose access patterns were not written down before the keys were named. The fix is almost never “add capacity” — it is “make the keys do the selection the query needs.”

To frame the whole field before the deep dive, here is every failure class this article covers, the relational instinct that causes it, and the single-table move that prevents it:

Failure class The relational instinct behind it What it costs in production The single-table move
Multi-round-trip reads Normalize entities into separate tables 2–3 network hops + capacity per screen Co-locate parent + children under one PK (item collection)
Scan creep “Just filter the table” for a new pattern RCU scales with table size, not result size Design keys / add an overloaded GSI so KeyConditionExpression selects
Hot partition throttling Key on something low-cardinality (date, status) ThrottledRequests while table sits at 40% Write-shard the PK to fan across physical partitions
400 KB item failure Append to one growing attribute (array) ValidationException with no fallback Model each increment as its own item (adjacency list)
GSI write bottleneck Treat indexes as free Under-provisioned GSI throttles the base table Size the GSI to base write rate; narrow the projection
Stale data after a new GSI Assume an index sees existing rows Queries silently miss history Idempotent, throttled backfill before you query

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should already understand DynamoDB’s primitives: a table holds items (rows) made of attributes; an item is identified by a primary key that is either a single partition key or a composite partition key (PK) + sort key (SK); read capacity units (RCU) and write capacity units (WCU) meter throughput; and the API verbs are GetItem, PutItem, UpdateItem, DeleteItem, Query, Scan, BatchGetItem, BatchWriteItem, and TransactWriteItems/TransactGetItems. You should be comfortable running the AWS CLI, reading DynamoDB JSON (the {"S": "..."} typed form), and reasoning about eventual versus strong consistency.

This sits in the AWS data-modeling track and is the deep, prescriptive companion to the broader DynamoDB Deep Dive: Tables, Keys, Capacity, GSIs & Streams — that article surveys the service; this one is the schema-design craft. It pairs with DynamoDB Streams: Change Data Capture & Event-Driven Pipelines for the reshaping/reconciliation machinery, and the same partition-design principles transfer to Cosmos DB Partition Key Design & RU Optimization on Azure. Where this fits in the bigger picture: it is upstream of every serverless API you build on DynamoDB, because the key schema decides whether your Lambda handlers run one query or three.

A quick map of who owns which decision in a single-table design, so the right person reviews the right thing:

Decision layer What is decided here Who usually owns it What goes wrong if skipped
Access-pattern list Every read/write, filter, sort, cardinality Product + backend lead Schema gets refactored after launch
Key schema (PK/SK) Overloading, prefixes, item collections Data modeler / senior backend Multi-round-trip reads; Scan creep
GSI design Overloading, sparseness, projection Data modeler Wrong access shape; write bottleneck
Capacity mode On-demand vs provisioned + auto scaling Backend + FinOps Over-pay or throttle under spike
Hot-key defense Write-sharding, fan-out reads Senior backend Per-partition throttling at peak
Schema evolution Online GSI add, backfill, Streams Platform / data eng Downtime or stale index data

Core concepts

Six mental models make every later decision obvious.

The schema is downstream of the queries. In DynamoDB you do not model entities and then query them; you enumerate the queries and then design keys that make each query a single-partition read. The access-pattern list — every read and write with its filter, sort order, and cardinality — is the artifact you review with the team. If you cannot serve an access pattern with one Query or GetItem, the schema is incomplete, not the application.

Filtering is not querying. A KeyConditionExpression selects items by key before reading them; a FilterExpression runs after the key query and before results return. Filtering reduces the payload you receive but never the capacity consumed or the items examined. Any pattern that can only be served by Scan + FilterExpression reads the whole table and pays for it — that is a modeling bug, not a tuning opportunity.

Entity overloading is the core trick. Name the keys generically (PK, SK) and encode the entity type into the value with a prefix (CUST#, ORDER#, ITEM#). Now different entity types coexist in one table, and a single partition can hold a parent row and all its child rows — an item collection — so one Query returns the aggregate. The same idea applied to a secondary index is index overloading: generic GSI1PK/GSI1SK that each entity populates with whatever it needs to be found by.

A GSI is an alternate, asynchronous view. A Global Secondary Index is a second (PK, SK) over the same items, maintained for you on every write with a small propagation delay (eventually consistent only). It has its own throughput and its own projected copy of attributes. Two GSI behaviors are load-bearing: a sparse index contains an item only if the item has both of that index’s key attributes (so you index a working set, not everything), and under provisioned mode an under-provisioned GSI can throttle the base table’s writes.

Partitions are physical and capped. DynamoDB hashes the partition key to choose a physical partition, and a single partition sustains roughly 1,000 WCU and 3,000 RCU. Exceed either on one key and you throttle even when table-level capacity is healthy. Adaptive capacity shifts throughput toward busy partitions and can isolate a single hot item, but it cannot exceed those per-partition limits — a key that needs more than ~1,000 WCU must spread across more than one partition-key value.

Hard limits are absolute. An item — all its attributes including names — cannot exceed 400 KB. A sort key value and a partition key value have their own size caps. A table allows at most 20 GSIs with one create/delete in flight at a time. These are not tunable; you model around them. The append-an-unbounded-array pattern is the classic 400 KB trap.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the model side by side:

Term One-line definition Where it lives Why it matters to single-table design
Access pattern One concrete read or write the app needs Design doc The thing the schema is derived from
Partition key (PK) Hashed key choosing the physical partition Item primary key Decides co-location and hot-key risk
Sort key (SK) Orders items within a partition Item primary key Enables range queries and adjacency lists
Entity overloading Generic PK/SK + type prefixes Key naming convention Lets many entities share one table
Item collection All items sharing one PK One physical partition One Query returns parent + children
GSI Alternate async (PK, SK) over the items Table-level index Serves a different access shape
Index overloading Generic GSIxPK/GSIxSK per entity GSI key attributes One index serves many logical patterns
Sparse index Item indexed only if it has the keys GSI semantics Indexes a working set (e.g. open orders)
Projection Which attributes a GSI copies GSI definition Drives index storage + write cost
Hot partition One PK taking disproportionate traffic Physical partition Throttles at ~1,000 WCU / ~3,000 RCU
Write sharding Suffixing the PK to fan writes Key value Spreads a high-write key across partitions
Adaptive capacity Auto-shift of throughput to busy keys Platform behavior Smooths skew, can’t beat per-partition cap
Condition expression A write that applies only if a predicate holds Write request Enforces invariants atomically
TransactWriteItems All-or-nothing across ≤100 items Write API Multi-item invariants; costs 2× WCU

Limits and quotas you model around

These are the hard numbers single-table design is engineered against — none are tunable, so the schema absorbs them. Keep this open when you size keys, shards, projections, and transactions:

Limit / quota Value What it constrains The single-table consequence
Item size 400 KB (names + values) One item’s total bytes Append-style data → separate items
Per-partition write ~1,000 WCU Throughput on one physical partition High-write keys must be sharded
Per-partition read ~3,000 RCU Read throughput on one partition Hot read keys must be sharded/cached
Partition key length up to 2,048 bytes PK value size Keep prefixes short
Sort key length up to 1,024 bytes SK value size Bounded path depth in hierarchies
GSIs per table 20 Number of alternate indexes Overload indexes to stay under it
LSIs per table 5 (create-time only) Local indexes Rarely used in single-table design
LSI item-collection size 10 GB per PK Total of a partition + its LSIs A reason to prefer GSIs
TransactWriteItems items 100 Items per transaction Big aggregates split or batch
BatchWriteItem items 25 Items per batch call Loop/paginate large writes
Query/Scan page size 1 MB Bytes returned per call Paginate with LastEvaluatedKey
Provisioned throughput decrease limited per day Scale-down frequency Plan auto-scaling min carefully
Mode switch (on-demand ↔ provisioned) once / 24h Capacity-mode changes Not a runtime knob

1. Working backward: enumerate access patterns before touching keys

The schema is downstream of the queries. Before naming a single attribute, write the complete list of access patterns the service needs — every read and write, with its filter, its sort, and its cardinality. This is the artifact you review with the team, not the data model.

For a multi-tenant order-management service, the list looks like this:

# Access pattern Type Frequency Cardinality concern
A1 Get a customer by ID Read High One item — safe
A2 Get all orders for a customer, newest first Read High Bounded per customer
A3 Get a single order with its line items Read High Bounded per order
A4 List orders in a status (e.g. SHIPPED) for a customer Read Medium Bounded per customer
A5 Get all open orders across all customers (ops dashboard) Read Low Cross-tenant — hot-key risk
A6 Create order + line items atomically Write High Transaction (≤100 items)
A7 Update order status Write High Single-item update

The cardinality column is not decoration. A5 — “all open orders across all customers” — will create a hot partition if modeled naively, because every write funnels into one item collection. Flag those now. Each access pattern then maps to a precise key construction, and writing that mapping table before you create the table is what prevents the post-launch refactor:

# Pattern Served by Key expression Index
A1 Customer by ID GetItem PK=CUST#<id>, SK=PROFILE base
A2 Orders for a customer, newest first Query PK=CUST#<id> AND begins_with(SK,"ORDER#"), ScanIndexForward=false base
A3 Order with line items Query PK=ORDER#<id> base
A4 Orders in a status for a customer Query GSI1PK=CUST#<id>#<status> GSI1
A5 All open orders Query GSI2PK="OPEN" GSI2 (sparse)
A6 Create order + items TransactWriteItems per-item attribute_not_exists(PK) base
A7 Update order status UpdateItem PK=CUST#<id>, SK=ORDER#... + REMOVE GSI2* base

Three rules I hold to, and the consequence of breaking each:

Rule Why it holds What breaking it costs
No Scan in the steady state Scan reads every item then filters RCU scales with table size; latency grows unbounded
Filtering is not querying FilterExpression runs after the key query You pay capacity for discarded items
One Query/GetItem per pattern A second call means a missing index/item Latency doubles; consistency window widens

2. Primary key design: partition/sort composition and entity overloading

DynamoDB gives you a composite primary key: a partition key (PK, decides the physical partition via an internal hash) and a sort key (SK, orders items within that partition). The power of single-table design comes from entity overloading — naming the keys generically (PK, SK) so different entity types can share the table, and encoding the type into the value with a prefix.

Here is the item collection that satisfies A1, A2, A3, and A6 — a customer and all of their orders and line items live under one partition key:

PK                  SK                       attributes
------------------- ------------------------ ----------------------------------
CUST#a1b2           PROFILE                  name, email, tier, createdAt
CUST#a1b2           ORDER#2026-06-01#o-9001  status=OPEN, total=149.00
CUST#a1b2           ORDER#2026-06-03#o-9044  status=SHIPPED, total=72.50
ORDER#o-9001        ITEM#001                 sku=ABC, qty=2, price=49.50
ORDER#o-9001        ITEM#002                 sku=XYZ, qty=1, price=50.00

The full key map for this model — every entity type and its base-table and GSI keys — is the single sheet you keep next to the code. This is “enumerate everything”: each row is one entity, and the prefixes are the contract the whole service shares:

Entity PK SK GSI1PK GSI1SK GSI2PK In sparse GSI2 when
Customer profile CUST#<id> PROFILE never
Order (by customer) CUST#<id> ORDER#<date>#<oid> CUST#<id>#<status> <date>#<oid> OPEN status = OPEN
Order metadata (by order) ORDER#<oid> META never
Line item ORDER#<oid> ITEM#<seq> never
Payment ORDER#<oid> PAYMENT#<ts> never
Address CUST#<id> ADDR#<label> never
Membership (user↔group) USER#<uid> GROUP#<gid> GROUP#<gid> USER#<uid> never
Category node CATALOG CATEGORY#<path> never
Inventory level SKU#<sku> STOCK LOW qty < threshold
Audit event ORDER#<oid> EVENT#<ts> never
Idempotency token IDEMP#<key> LOCK never
Session SESSION#<sid> META USER#<uid> SESSION#<sid> never

Two design choices do the heavy lifting:

  1. Prefixes make sort keys range-queryable by type. A2 (“orders for a customer, newest first”) is PK = CUST#a1b2 AND begins_with(SK, "ORDER#"), with ScanIndexForward = false to reverse the sort. Because the order date is the first sortable component of the SK (ISO-8601), newest-first falls out for free.
  2. Line items hang off the order, not the customer. A3 (“an order with its items”) is PK = ORDER#o-9001 — one query returns the order metadata row and its ITEM# rows, because they share a partition. This is the adjacency-list pattern.

Writing A6 (order plus line items, atomically) uses TransactWriteItems, covered in Section 7.

# A2: all orders for a customer, newest first (DynamoDB JSON via AWS CLI)
aws dynamodb query \
  --table-name app-main \
  --key-condition-expression "PK = :pk AND begins_with(SK, :prefix)" \
  --expression-attribute-values '{":pk":{"S":"CUST#a1b2"},":prefix":{"S":"ORDER#"}}' \
  --no-scan-index-forward

The key-condition operators you can use on a sort key — and the one you cannot — decide what queries the SK supports, so choose the SK structure against this table:

SK operator Example Use for Note
= SK = "PROFILE" Exact child row Single item
begins_with begins_with(SK,"ORDER#") All children of a type The workhorse
BETWEEN SK BETWEEN "ORDER#2026-06-01" AND "ORDER#2026-06-30" Date/range slice Needs coarse-to-fine SK
<, <=, >, >= SK > "ORDER#2026-06-01" One-sided range Pagination boundaries
(none on PK) PK is always = You cannot range a PK

The data-type and encoding choices for keys are not cosmetic — they decide whether sorting and ranges behave. Pick deliberately:

Choice Options Pick when Gotcha
PK/SK type S (string), N (number), B (binary) S for prefixed overloaded keys N sorts numerically; S lexicographically
Timestamp format ISO-8601 string vs epoch number ISO-8601 in an S SK Numbers as strings sort wrong ("10" < "9")
Delimiter #, ` , ~` # (convention)
Component order coarse → fine Range/sort on the coarse part Wrong order kills BETWEEN
Zero-padding pad numeric components in S Numbers embedded in string SKs item#7 sorts after item#10 unpadded

A practical rule for sort-key composition: order the components from coarsest to finest, and only put something in the SK if you will range-query or sort on it. ORDER#<date>#<orderId> lets you filter a date range with BETWEEN; ORDER#<orderId>#<date> does not. The trade-offs across the common key shapes themselves:

Key shape Co-location Range queries Hot-key risk Best for
PK only (no SK) None None Low (high cardinality) Pure key/value lookups
PK + simple SK Per-PK collection Yes (on SK) Depends on PK cardinality Most entities
Overloaded PK + SK Multi-entity collection Yes, by prefix Manage per entity Single-table core
Low-cardinality PK Everything in few partitions Yes High — throttles Avoid; shard instead

3. Global secondary indexes: sparse indexes, index overloading, projections

The base table answers patterns keyed on the customer or the order. A4 and A5 need a different access shape — that is what a Global Secondary Index is for: an alternate (PK, SK) over the same items, maintained asynchronously on every write.

Before the techniques, the index-type decision itself — GSI versus LSI — is one you make once at table-design time and (for LSIs) can never undo:

Property Global Secondary Index (GSI) Local Secondary Index (LSI)
Partition key Any attribute (different from base) Same PK as base table
Sort key Any attribute Alternate SK, same PK
When creatable Any time (online) Only at table creation
Max per table 20 5
Consistency Eventual only Strong or eventual
Throughput Its own (or shared on-demand) Shares the table’s
Item-collection size cap None 10 GB per PK
Single-table fit The default choice Rare; the 10 GB cap bites

Three GSI techniques carry single-table design:

Index overloading. Add generic attributes GSI1PK / GSI1SK and let each entity type populate them with whatever it needs to be found by. One physical index serves many logical patterns. For A4 (“orders in a status for a customer”), order items set:

GSI1PK = CUST#a1b2#SHIPPED      GSI1SK = 2026-06-03#o-9044

A4 becomes Query on GSI1 with GSI1PK = CUST#a1b2#SHIPPED.

Sparse indexes. An item appears in a GSI only if it has both of that index’s key attributes — a feature, not a limitation. For A5 (“all open orders across all customers”), do not index every order, only OPEN ones. Write GSI2PK = "OPEN" only while the order is open, and remove the attribute when it ships. The index then holds exactly the working set of open orders, so the ops query touches a fraction of the data. This is the canonical sparse-index pattern: a queue you Query by presence.

# A7: status -> SHIPPED, which REMOVES the item from the sparse "open orders" GSI
aws dynamodb update-item \
  --table-name app-main \
  --key '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-01#o-9001"}}' \
  --update-expression "SET #s = :shipped REMOVE GSI2PK, GSI2SK" \
  --expression-attribute-names '{"#s":"status"}' \
  --expression-attribute-values '{":shipped":{"S":"SHIPPED"}}'

Projection choices. A GSI stores a copy of attributes, billed as extra storage and extra write capacity on every base-table write that touches a projected attribute. Choose deliberately:

Projection What it stores Storage / write cost Use when One-way?
KEYS_ONLY Index + base keys only Lowest You only need the key, then GetItem Widen later via new GSI
INCLUDE Keys + a named attribute list Moderate (the list) Project exactly what the query reads Cannot shrink in place
ALL Every attribute Highest Query genuinely needs the whole item Cannot shrink in place

Project narrowly and widen later: you can create a new GSI online, but you cannot shrink a projection in place. ALL on a wide, hot item is a line item you will see on the bill. The three GSI techniques mapped to the access patterns they unlock:

Technique What it does Access pattern it serves Cost lever
Index overloading Generic GSIxPK/SK per entity Many logical patterns on one index Stays within the 20-GSI cap
Sparse index Index only items that have the keys Working-set queues (open orders) Index holds a fraction of items
Narrow projection Copy only needed attributes The query’s read set Lower storage + replication WCU

GSIs have their own provisioned throughput (or share the table’s on-demand capacity). Critically, under provisioned mode, if a GSI is throttled, base-table writes throttle too — an under-provisioned index becomes a write bottleneck for the whole table. The GSI behaviors that surprise people, and how to keep them from biting:

GSI behavior The surprise How to handle it
Eventually consistent only No strongly-consistent GSI read exists GetItem the base item if you need strong consistency
Separate throughput Under-provision → base writes throttle Size GSI WCU ≥ base write rate touching its keys
Projection replication Every projected-attr write costs index WCU Project narrowly; avoid ALL on hot items
Sparse by key presence Item silently absent if a key attr is missing Set/remove the key attribute deliberately
Backfill on creation New GSI ignores existing rows Backfill before querying (Section 8)

4. Modeling relationships: adjacency lists, hierarchies, many-to-many

Single-table design models relationships by placement, not by joins. The three relationship cardinalities each have a canonical encoding:

Relationship Relational answer DynamoDB encoding Read it with
One-to-many Foreign key + join Parent + children share a PK (adjacency list) One Query on the PK
Hierarchy / tree Recursive self-join Path in the SK (A#B#C) begins_with on the path prefix
Many-to-many Join table Materialize edge + flip with a GSI Base for one direction, GSI for the other

One-to-many (adjacency list). Already shown: parent and children share a partition (ORDER#o-9001 owns its ITEM# rows). One Query returns the aggregate.

Hierarchies. Encode the path in the sort key. A category tree — Electronics > Audio > Headphones — stores SK = CATEGORY#Electronics#Audio#Headphones. begins_with(SK, "CATEGORY#Electronics#Audio") returns the whole subtree in one query, because lexicographic ordering on the delimited path mirrors the tree.

Many-to-many. The relational answer is a join table; the DynamoDB answer is to materialize both directions of the edge and flip them with a GSI. For users-in-groups: store a membership item, then use GSI1 to invert PK and SK.

PK              SK              GSI1PK          GSI1SK
--------------- --------------- --------------- ---------------
USER#u1         GROUP#g1        GROUP#g1        USER#u1
USER#u1         GROUP#g2        GROUP#g2        USER#u1
USER#u2         GROUP#g1        GROUP#g1        USER#u2

One item, two access directions, no second write to keep in sync. When the relationship carries denormalized data (a group name shown on the user’s view), accept the duplication and reconcile it with DynamoDB Streams (Section 8) rather than reading two items per query. Denormalization is a deliberate trade — copy data to save a read, then keep the copies honest:

Denormalization decision Read-time benefit Write-time cost Reconcile with
Copy group name onto membership item No second read for the label Update fan-out on rename Streams Lambda updates copies
Store order total on customer row Dashboard avoids summing items Update on every item change Transaction or Streams
Duplicate edge both directions Both query directions are one call Two writes (or one + GSI flip) GSI flip needs no extra write
Keep a small “latest” summary item Cheap dashboard read One extra write per change Streams maintains it

5. Write sharding to avoid hot partitions and throttling

DynamoDB spreads data across partitions by hashing the partition key. Two failure modes follow: a single partition key taking disproportionate traffic (a hot key), and the hard physical ceiling — a single partition sustains roughly 1,000 WCU and 3,000 RCU. Exceed either and you throttle, even if table-level capacity looks healthy.

Adaptive capacity helps but does not excuse key design. DynamoDB shifts capacity toward busy partitions and can isolate a single hot item, but it cannot exceed those per-partition limits. A key that needs more than 1,000 WCU must spread across multiple physical partitions — which means more than one partition-key value.

Time-series keys are the classic trap. A PK of the current date sends every write today to one partition; yesterday’s is cold. If you must key on time, write-shard: append a calculated suffix to fan writes across N logical partitions.

import hashlib

SHARDS = 10  # tune to required WCU / 1000, rounded up

def shard_suffix(item_id: str, shards: int = SHARDS) -> int:
    # Deterministic so the read side can recompute it
    h = hashlib.md5(item_id.encode()).hexdigest()
    return int(h, 16) % shards

# write: PK = "ORDER#2026-06-08#7"  (date + shard)
pk = f"ORDER#2026-06-08#{shard_suffix('o-9001')}"

The tradeoff is explicit: reading all of today’s orders now means N queries (PK = ORDER#2026-06-08#0#9) merged client-side. Sharding trades read fan-out for write throughput. Two ways to pick the suffix:

Suffix strategy How it is computed Read side Best for
Calculated Hash of a key attribute mod N Recompute the exact shard for a point read Read-by-ID workloads
Random random.randint(0, N-1) Scatter across all N shards Write-then-batch-read workloads

Sizing the shard count is arithmetic, not guesswork — pick N from the peak write rate of the hottest key:

Required WCU on one logical key Min shards (WCU / 1000, rounded up) Read fan-out cost Note
≤ 1,000 1 (no shard) 1 query A single partition suffices
~2,500 3 3 queries merged Round up, leave headroom
~9,000 10 10 queries merged The common default
~25,000 25 25 queries merged Reconsider the key entirely

The hot-partition symptoms and how to read them apart from genuine under-provisioning:

Signal Hot partition Genuinely under-provisioned
ThrottledRequests > 0 > 0
ConsumedWriteCapacityUnits vs provisioned Well below provisioned At/above provisioned
Contributor Insights top key One key dominates Traffic spread evenly
Fix that works Write-shard the key Raise capacity / on-demand

Diagnose hot partitions with CloudWatch Contributor Insights for DynamoDB, which surfaces the most-accessed partition keys. ThrottledRequests on a table that is nowhere near its provisioned total is the signature of a hot key, not insufficient capacity.

6. Capacity modes: on-demand vs provisioned with auto scaling

Two billing models, and the choice is about traffic shape, not just volume.

On-demand bills per request, scales instantly, and needs zero capacity planning. It keeps prior peaks warm so it can double instantly from the previous high-water mark, but a genuine cold 10x spike can still throttle for a moment. Use it for new tables (unknown traffic), spiky workloads, and dev/test.

Provisioned reserves RCU/WCU and is materially cheaper per request for steady, predictable load — but you pay for that capacity whether you use it or not. Pair it with Application Auto Scaling, which tracks a target utilization (typically 70%) between a min and max. It reacts on a CloudWatch alarm timescale (a minute or two): good for gentle diurnal curves, poor at absorbing sharp spikes.

Dimension On-demand Provisioned + auto scaling
Billing Per request (RRU/WRU) Per provisioned RCU/WCU-hour
Capacity planning None Set min/max + target %
Spike response Instant up to ~2× prior peak Minutes (alarm-driven)
Cost at steady high load Higher per request Lower (esp. with reserved)
Cost at low/idle Pay only for use Pay the provisioned floor
Best for New, spiky, dev/test Predictable, steady, diurnal
Switch frequency Once per 24h between modes
# Provisioned table with target-tracking auto scaling (Terraform)
resource "aws_dynamodb_table" "main" {
  name         = "app-main"
  billing_mode = "PROVISIONED"
  hash_key     = "PK"
  range_key    = "SK"
  read_capacity  = 50
  write_capacity = 50

  attribute { name = "PK" type = "S" }
  attribute { name = "SK" type = "S" }
}

resource "aws_appautoscaling_target" "write" {
  service_namespace  = "dynamodb"
  resource_id        = "table/${aws_dynamodb_table.main.name}"
  scalable_dimension = "dynamodb:table:WriteCapacityUnits"
  min_capacity       = 50
  max_capacity       = 2000
}

resource "aws_appautoscaling_policy" "write" {
  name               = "write-target-70"
  service_namespace  = aws_appautoscaling_target.write.service_namespace
  resource_id        = aws_appautoscaling_target.write.resource_id
  scalable_dimension = aws_appautoscaling_target.write.scalable_dimension
  policy_type        = "TargetTrackingScaling"

  target_tracking_scaling_policy_configuration {
    target_value = 70.0
    predefined_metric_specification {
      predefined_metric_type = "DynamoDBWriteCapacityUtilization"
    }
  }
}

The auto-scaling knobs and sensible starting points, so the table reacts without thrashing:

Auto-scaling setting What it controls Starting point Trade-off
Target utilization Headroom above current use 70% Lower = more headroom, more cost
Min capacity The floor (buy as reserved) Your baseline Too low → throttle on the ramp
Max capacity Hard ceiling 2–4× baseline Too low → throttle at peak
Scale-out cooldown Wait before scaling up again Short (seconds–1 min) Too long → slow to absorb a ramp
Scale-in cooldown Wait before scaling down Longer (minutes) Too short → flap on jitter

For a stable baseline, reserved capacity discounts that floor in exchange for a one- or three-year commitment — buy it for the min, let auto scaling handle the rest. You can switch a table between on-demand and provisioned only once every 24 hours, so the mode is not a runtime knob. Match the capacity mode to the traffic shape with this decision table:

If your traffic is… Then choose… Because
Brand new / unknown On-demand No data to size provisioned from
Spiky / unpredictable On-demand Instant scale, no throttle on bursts
Steady with a diurnal curve Provisioned + auto scaling Cheaper per request, scaling rides the curve
Steady with a known floor Provisioned + reserved for the floor Reserved discount on guaranteed baseline
Flash-sale / sharp 10× spikes On-demand (or pre-scaled provisioned) Auto scaling can’t react fast enough

7. Transactions, condition expressions, and optimistic concurrency

A PutItem is atomic for one item and immediately visible to strongly-consistent reads on the base table. The harder guarantees come from three tools.

Condition expressions make a write conditional and reject it atomically otherwise. The most important one prevents blind overwrites: attribute_not_exists(PK) makes PutItem an insert, failing with ConditionalCheckFailedException if the item already exists. The functions and operators you compose conditions from:

Condition function / operator Meaning Canonical use
attribute_not_exists(PK) Item/attr does not exist Insert-only (no overwrite)
attribute_exists(PK) Item/attr exists Update-only (must already be there)
attribute_type(a, :t) Attribute is of a type Defensive schema checks
begins_with(a, :p) String prefix Guard on a structured value
<, <=, =, >=, >, <> Comparisons Version / counter guards
AND, OR, NOT Boolean composition Multi-condition guards

Optimistic concurrency uses a version attribute so a lost update is rejected rather than silently clobbered:

# Update only if version is unchanged; bump it in the same call
aws dynamodb update-item \
  --table-name app-main \
  --key '{"PK":{"S":"ORDER#o-9001"},"SK":{"S":"META"}}' \
  --update-expression "SET #st = :new, version = :nextv" \
  --condition-expression "version = :curv" \
  --expression-attribute-names '{"#st":"status"}' \
  --expression-attribute-values '{":new":{"S":"PAID"},":curv":{"N":"7"},":nextv":{"N":"8"}}'

If another writer bumped version to 8 first, the condition version = 7 fails; you re-read and retry. No locks, no contention beyond the conflicting writers.

TransactWriteItems gives all-or-nothing across up to 100 items (and multiple tables), each with its own condition. This is how A6 inserts an order and its line items atomically:

{
  "TransactItems": [
    { "Put": {
        "TableName": "app-main",
        "Item": {"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-08#o-9100"},"status":{"S":"OPEN"}},
        "ConditionExpression": "attribute_not_exists(PK)"
    }},
    { "Put": {
        "TableName": "app-main",
        "Item": {"PK":{"S":"ORDER#o-9100"},"SK":{"S":"ITEM#001"},"sku":{"S":"ABC"}}
    }}
  ]
}

Two costs to internalize: a transactional write consumes 2x the WCU of the same non-transactional write (prepare plus commit), and a transaction fails entirely if any condition fails or it collides with another transaction on the same item (TransactionCanceledException, with per-item reasons). The four write-consistency tools, side by side, so you reach for the cheapest one that gives the guarantee:

Tool Guarantee Cost Use when
Plain PutItem/UpdateItem Single-item atomic 1× WCU No cross-item invariant
Condition expression Atomic conditional (insert/guard) 1× WCU (failed write still charges) Prevent overwrite / enforce a predicate
Optimistic concurrency (version) No lost update 1× WCU + retry on conflict Concurrent updates to one item
TransactWriteItems All-or-nothing across ≤100 items 2× WCU Multi-item invariant (order + items)

Use transactions where you need the invariant; do not wrap every write in one.

8. Migrations and backfills: evolving the schema without downtime

Single-table schemas evolve constantly — a new access pattern means a new GSI or a derived attribute. DynamoDB is schemaless at the item level, so adding attributes needs no migration. The work is in indexes and backfills.

Adding a GSI is an online operation. UpdateTable with a GSI create returns immediately; DynamoDB backfills it in the background while the table stays fully available. The index reports CREATING then ACTIVE — do not query it until ACTIVE, and watch OnlineIndexPercentageProgress. A table allows at most 20 GSIs, with only one create or delete in flight at a time.

aws dynamodb update-table \
  --table-name app-main \
  --attribute-definitions \
      AttributeName=GSI3PK,AttributeType=S AttributeName=GSI3SK,AttributeType=S \
  --global-secondary-index-updates '[{
    "Create": {
      "IndexName": "GSI3",
      "KeySchema": [
        {"AttributeName":"GSI3PK","KeyType":"HASH"},
        {"AttributeName":"GSI3SK","KeyType":"RANGE"}
      ],
      "Projection": {"ProjectionType":"INCLUDE","NonKeyAttributes":["status","total"]}
    }
  }]'

A new GSI only contains items that already carry GSI3PK/GSI3SK. Existing items stay invisible to it until you write those attributes — that is the backfill. The migration techniques, and when each is the right tool:

Technique Touches live capacity? Best for Watch-out
Add attributes (schemaless) Minimal New optional fields Old items lack the field
Online GSI create Background backfill New access pattern on new attrs Query only after ACTIVE
Parallel Scan + UpdateItem Yes — rate-limit it Backfilling new key attrs Throttle; make idempotent
DynamoDB Streams → Lambda Incremental, ongoing Continuous reshaping/denorm At-least-once; dedupe
Export to S3 + transform + BatchWriteItem No (export is free of RCU) One-time bulk transform Re-import path; eventual cutover

Backfill with a throttled job, not a Scan-and-update loop that melts capacity. Pattern: parallel Scan with Segment/TotalSegments, transform each item, UpdateItem the new attributes, rate-limited against provisioned capacity. AWS Glue, Step Functions, or a Lambda fan-out are the usual harnesses. Make the transform idempotent (a condition like attribute_not_exists(GSI3PK) so re-runs skip done items) and write-shard the target if the new key would be hot.

For continuous reshaping, use DynamoDB Streams. A Lambda reacts to every change to keep a denormalized copy or new index attribute current — the same machinery that reconciles the many-to-many duplication from Section 4. For a one-time bulk transform across a huge table, export to S3 (a point-in-time export that consumes no read capacity), transform with Athena or Glue, and BatchWriteItem the result back, keeping the migration entirely off the live table’s capacity. A backfill is correct only if it is safe to re-run — the idempotency checklist:

Backfill property Why it matters How to ensure it
Idempotent Re-runs and overlaps must not double-apply Condition attribute_not_exists(GSI3PK)
Rate-limited A full-speed Scan melts provisioned capacity Cap WCU/RCU; use Limit; back off on throttle
Reconcilable You must prove it finished --select COUNT old vs new agrees
Resumable Big tables take hours Segment-based parallel Scan checkpoints
Off-path for huge tables Don’t compete with live traffic Export to S3, transform, re-import

Architecture at a glance

The diagram traces a single-table design the way the data and control actually move, left to right, and pins the five classic failure points onto the exact node where each bites. Start at the left: your service code holds the access-pattern list (A1–A7) and issues Query/GetItem (never Scan) against the base table app-main, whose overloaded PK=CUST#/SK=ORDER# keys let a customer, their orders, and the orders’ line items share one item collection so a single query returns the aggregate. Reads that need a different shape hit the secondary indexes — an overloaded GSI1 (CUST#STATUS, INCLUDE projection) and a sparse GSI2 holding only OPEN orders as a working-set queue. Writes flow down the write path: high-volume keys go through a write-shard (#0..#9) to fan across physical partitions, multi-item invariants use TransactWriteItems (≤100 items, 2× WCU), and everything is encrypted at rest with a KMS CMK. Finally, DynamoDB Streams drives a Lambda that reconciles duplicates and backfills new GSI keys, while CloudWatch Contributor Insights watches for hot keys and ThrottledRequests.

Read the numbered badges as the diagnostic map laid over that architecture. Badge 1 sits on the physical partition — the per-partition ~1,000 WCU / ~3,000 RCU ceiling where a hot key throttles while the table looks idle. Badge 2 sits on the sparse GSI, where an under-provisioned index throttles the base table’s writes. Badge 3 sits on the item collection, the 400 KB-per-item limit you hit by appending to an unbounded array. Badge 4 sits on the Streams/ETL node, the backfill gap where a new GSI silently misses historical rows. Badge 5 sits back on the service code, where a new access pattern degrades into a Scan. The legend narrates each as symptom · confirm · fix — the same method as the playbook section below: localize the failure to a node, confirm with the named metric or exception, apply the keys-or-capacity fix.

DynamoDB single-table-design data architecture flowing left to right: service code holding the A1-A7 access-pattern list issues Query/GetItem against the base table app-main with overloaded PK=CUST#/SK=ORDER# keys forming item collections, fanning to an overloaded GSI1 and a sparse GSI2 of open orders; the write path runs through a write-shard (#0..#9), TransactWriteItems, and a KMS CMK; DynamoDB Streams drives a reconcile/backfill Lambda and CloudWatch Contributor Insights watches hot keys and ThrottledRequests; five numbered badges mark the failure points — hot partition throttle at the per-partition 1000 WCU/3000 RCU ceiling, sparse-GSI throttle of the base table, the 400 KB item limit on the item collection, the backfill gap on a new GSI, and Scan creep at the service code

Real-world scenario

Northwind Logistics runs a parcel-tracking platform on a single DynamoDB table tracking-main, keyed PK = SHIPMENT#<id>, SK = EVENT#<timestamp>, with a customer-and-shipments item collection alongside. On-demand capacity, point-in-time recovery on, a sparse GSI for “in-transit shipments,” and a Streams Lambda feeding an OpenSearch index for the support console. Average load is 3,000 writes/second of scan events; the data team is three engineers and the table had run clean for two years.

Peak season broke it on a single Monday. Two failures hit at once. First, a handful of mega-shipments — palletized freight with tens of thousands of scanned parcels — accumulated their events under one PK = SHIPMENT#<id>, and those partitions began throwing ThrottledRequests while the table sat at roughly 40% of its on-demand high-water mark. The on-call engineer’s reflex was to assume under-provisioning and consider raising limits — but the table was nowhere near its ceiling. Contributor Insights told the truth: three shipment IDs dominated the most-throttled-key list. This was the per-partition ~1,000 WCU ceiling on a hot key, not table capacity. Second, a per-shipment rollup item that appended each milestone to a events array started failing writes with ValidationException: Item size has exceeded the maximum allowed size — it had finally crossed 400 KB.

The breakthrough was naming the two failures precisely instead of reaching for the capacity slider. The hot partition was a key-design problem; the 400 KB error was a modeling problem. Neither is fixed by more capacity. The team confirmed the hot key with Contributor Insights and the consumed-vs-provisioned gap, and confirmed the item-size failure by logging item bytes before each rollup write.

The fix landed in two changes, both deployable without downtime. First, write-shard the event partition for high-volume shipments only — PK = SHIPMENT#<id>#<shard> with a calculated suffix from the event ID, fanning hot shipments across 10 partitions while small shipments stayed single-partition so their reads remained one query. Full-history reads for a mega-shipment became 10 parallel queries merged client-side — acceptable because that read was rare and the per-event writes were the hot path.

import hashlib

# Shard only high-volume shipments; keep small ones single-partition
# so their reads stay a single query.
def event_pk(shipment_id: str, event_id: str, high_volume: bool) -> str:
    if not high_volume:
        return f"SHIPMENT#{shipment_id}"
    shard = int(hashlib.md5(event_id.encode()).hexdigest(), 16) % 10
    return f"SHIPMENT#{shipment_id}#{shard}"

Second, they stopped appending to the rollup array. Each milestone became its own item under the adjacency-list SK (SK = MILESTONE#<seq>), sidestepping the 400 KB ceiling entirely, while a Streams Lambda maintained a small fixed-size “latest status” summary item for the dashboard read. The next peak ran at 3,400 writes/second with zero ThrottledRequests and no ValidationException, and because the writes spread evenly the on-demand bill actually dropped slightly versus the throttled-and-retrying weeks before.

The incident as a timeline, because the order of moves is the lesson:

Time Symptom Action taken Effect What it should have been
Mon 09:00 ThrottledRequests climbing (alert fires) Ask: hot key or under-provisioned?
09:10 Throttling at 40% of peak Considered raising limits Would not have helped Check consumed vs provisioned first
09:25 Still throttling Opened Contributor Insights 3 shipment IDs dominate This was the breakthrough
09:40 Rollup writes failing Logged item bytes pre-write Item > 400 KB confirmed Two distinct root causes named
11:00 Mitigated Write-shard high-volume shipments Hot partitions clear Correct day-of fix
+3 days Fixed Milestones as items + Streams summary 0 throttles, 0 size errors The actual fix is modeling

The lesson on the wall: adaptive capacity had silently smoothed the skew for two years, so the team assumed the key design was fine. Hot-partition risk is a function of the busiest key, not the average — and it stays invisible until the day it isn’t.

Advantages and disadvantages

Single-table design both enables DynamoDB’s single-digit-millisecond reads at any scale and demands discipline most relational engineers have to unlearn. Weigh it honestly:

Advantages (why it wins) Disadvantages (why it bites)
One Query returns a parent and all its children — no joins, flat latency at any table size The model is rigid: a new access pattern can mean a new GSI or a backfill, not a free ad-hoc query
Co-located item collections eliminate multi-round-trip reads Item collections concentrate writes — co-location is also hot-key risk
Overloaded GSIs serve many patterns within the 20-index cap The keys are opaque (PK/SK with prefixes) — harder to read than named columns
Sparse indexes hold only the working set, so ops queries touch a fraction of data Forgetting to remove a sparse key leaves stale items in the index
Capacity is per-request and predictable; you pay for the traffic shape you have Per-partition ~1,000 WCU / ~3,000 RCU ceiling is invisible until a hot key hits it
Transactions and condition expressions give write-time invariants without locks Transactions cost 2× WCU and fail the whole batch on one conflict
Schemaless items make adding attributes free The 400 KB item limit is absolute — append-style data must be re-modeled

The model is right for high-scale, well-understood workloads where the access patterns are knowable up front and read latency must stay flat as data grows — SaaS, commerce, event logs, graphs. It is the wrong default for exploratory analytics, ad-hoc reporting, or workloads whose query shapes change weekly; those want a relational store or a lakehouse, and DynamoDB feeds them via Streams or S3 export. The disadvantages are all manageable — but only if you know they exist before you name the first key, which is the entire point of working backward from access patterns.

Hands-on lab

Build the order-management model end to end, prove every access pattern is a single Query/GetItem, watch the sparse GSI hold only the working set, and confirm consumed capacity — all on-demand and free-tier-friendly (delete at the end). Run in any shell with the AWS CLI configured.

Step 1 — Create the table with two overloaded GSIs (on-demand).

aws dynamodb create-table \
  --table-name app-main \
  --attribute-definitions \
    AttributeName=PK,AttributeType=S AttributeName=SK,AttributeType=S \
    AttributeName=GSI1PK,AttributeType=S AttributeName=GSI1SK,AttributeType=S \
    AttributeName=GSI2PK,AttributeType=S AttributeName=GSI2SK,AttributeType=S \
  --key-schema AttributeName=PK,KeyType=HASH AttributeName=SK,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST \
  --global-secondary-indexes '[
    {"IndexName":"GSI1","KeySchema":[{"AttributeName":"GSI1PK","KeyType":"HASH"},{"AttributeName":"GSI1SK","KeyType":"RANGE"}],"Projection":{"ProjectionType":"INCLUDE","NonKeyAttributes":["status","total"]}},
    {"IndexName":"GSI2","KeySchema":[{"AttributeName":"GSI2PK","KeyType":"HASH"},{"AttributeName":"GSI2SK","KeyType":"RANGE"}],"Projection":{"ProjectionType":"KEYS_ONLY"}}
  ]'
aws dynamodb wait table-exists --table-name app-main

Expected: the command returns table metadata; wait blocks until ACTIVE.

Step 2 — Seed a customer, two orders, and line items (entity overloading).

aws dynamodb put-item --table-name app-main --item '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"PROFILE"},"name":{"S":"Acme Co"},"tier":{"S":"GOLD"}}'
aws dynamodb put-item --table-name app-main --item '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-01#o-9001"},"status":{"S":"OPEN"},"total":{"N":"149.00"},"GSI1PK":{"S":"CUST#a1b2#OPEN"},"GSI1SK":{"S":"2026-06-01#o-9001"},"GSI2PK":{"S":"OPEN"},"GSI2SK":{"S":"2026-06-01#o-9001"}}'
aws dynamodb put-item --table-name app-main --item '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-03#o-9044"},"status":{"S":"SHIPPED"},"total":{"N":"72.50"},"GSI1PK":{"S":"CUST#a1b2#SHIPPED"},"GSI1SK":{"S":"2026-06-03#o-9044"}}'
aws dynamodb put-item --table-name app-main --item '{"PK":{"S":"ORDER#o-9001"},"SK":{"S":"ITEM#001"},"sku":{"S":"ABC"},"qty":{"N":"2"}}'

Note the SHIPPED order has no GSI2PK — that is the sparse index doing its job.

Step 3 — A2: all orders for the customer, newest first.

aws dynamodb query --table-name app-main \
  --key-condition-expression "PK = :pk AND begins_with(SK, :p)" \
  --expression-attribute-values '{":pk":{"S":"CUST#a1b2"},":p":{"S":"ORDER#"}}' \
  --no-scan-index-forward --return-consumed-capacity TOTAL

Expected: two order items, the 2026-06-03 one first; ConsumedCapacity a fraction of an RCU.

Step 4 — A3: an order with its line items (adjacency list).

aws dynamodb query --table-name app-main \
  --key-condition-expression "PK = :pk" \
  --expression-attribute-values '{":pk":{"S":"ORDER#o-9001"}}'

Expected: the ITEM#001 row returned by one query keyed on the order.

Step 5 — A5: all OPEN orders via the sparse GSI, COUNT only.

aws dynamodb query --table-name app-main --index-name GSI2 \
  --key-condition-expression "GSI2PK = :open" \
  --expression-attribute-values '{":open":{"S":"OPEN"}}' --select COUNT

Expected: Count = 1 — only the OPEN order is in the index, proving sparseness.

Step 6 — A7: ship the order and watch it leave the sparse GSI.

aws dynamodb update-item --table-name app-main \
  --key '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-01#o-9001"}}' \
  --update-expression "SET #s = :sh REMOVE GSI2PK, GSI2SK" \
  --expression-attribute-names '{"#s":"status"}' \
  --expression-attribute-values '{":sh":{"S":"SHIPPED"}}'
# Re-run Step 5: Count is now 0 — the item dropped out of the working set.

Validation checklist. You modeled multiple entities in one table, served three different access patterns with single Query calls, saw the sparse GSI hold exactly the open working set, and watched a status update remove an item from that index — all without a Scan. The lab steps mapped to what each proves:

Step What you did What it proves Real-world analogue
1 Create table + 2 overloaded GSIs One table serves many access shapes Greenfield schema bring-up
2 Seed customer/orders/items Entity overloading co-locates entities Modeling the domain
3–4 Query by customer, then by order Item collections = one-query aggregates The high-frequency read paths
5 COUNT on the sparse GSI Sparse index holds only the working set The ops dashboard query
6 Ship order, remove GSI keys Sparse keys are set/removed deliberately The status-transition write

Cleanup (avoid lingering charges).

aws dynamodb delete-table --table-name app-main

Cost note. On-demand bills per request; this lab is a handful of requests — effectively free, and well within the DynamoDB free tier (25 GB storage, 25 provisioned WCU/RCU if you used provisioned instead). Deleting the table stops all storage charges.

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First as a scannable table you can read mid-incident, then the same entries with the full confirm-command detail underneath.

# Symptom Root cause Confirm (exact cmd / console path) Fix
1 ThrottledRequests while table sits at ~40% of capacity Hot partition — one PK over the per-partition ~1,000 WCU / ~3,000 RCU ceiling Contributor Insights top key; ConsumedWriteCapacityUnits below provisioned Write-shard the PK (#0..#9); fan-out reads
2 ValidationException: Item size has exceeded the maximum allowed size An attribute (array) grew the item past 400 KB Log item bytes before the write; inspect the offending item Model each increment as its own item (adjacency list)
3 A query consumes far more RCU than rows returned It is a Scan + FilterExpression, not a Query --return-consumed-capacity TOTAL vs row count; confirm it’s a Scan Add an overloaded GSI / redesign keys so KCE selects
4 Base-table writes throttle even though base capacity is healthy An under-provisioned GSI throttles back onto the base table WriteThrottleEvents on the index; per-index ConsumedWCU Provision GSI WCU ≥ base write rate; narrow projection
5 New GSI returns partial/empty results for old data Backfill gap — index only has items that already carry its keys Index Backfilling=true / OnlineIndexPercentageProgress < 100 Wait for ACTIVE; run an idempotent throttled backfill
6 ConditionalCheckFailedException on every retry of an update Optimistic-concurrency version moved under you Compare the item’s version to the one you sent Re-read, re-apply on the new version, retry
7 TransactionCanceledException under load A condition failed or two transactions hit the same item Read CancellationReasons[] per item Narrow the transaction; add jittered retry; reduce contention
8 Sparse-GSI “queue” keeps growing, never drains The sparse key isn’t removed on the state transition --select COUNT keeps rising; inspect a “done” item still has GSIxPK REMOVE GSIxPK, GSIxSK in the transition UpdateItem
9 Reads sometimes miss a just-written item Read a GSI (eventually consistent) expecting strong consistency The item exists on the base table but not yet in the GSI GetItem the base item, or accept the propagation delay
10 ProvisionedThroughputExceededException bursts at peak Provisioned + auto scaling can’t react fast enough to a spike Throttling correlates with a sharp ramp; instance count flat Switch to on-demand for spiky traffic, or pre-scale
11 Hot key after migrating to a new key — backfill itself throttles A full-speed Scan-and-update backfill melts capacity Throttling spikes only during the backfill job Rate-limit the job; export-to-S3 + transform off the live table
12 BETWEEN/range query returns nothing or wrong rows SK components ordered fine→coarse, or numbers stored as strings Inspect the SK structure of returned vs expected items Reorder SK coarse→fine; zero-pad or use ISO-8601

Before the expanded reasoning, the exception/error reference you scan first — the exact strings DynamoDB throws, what each means for a single-table model, and whether it is the client’s fault (retryable in-place) or a design fault:

Exception / error string What it means Retryable? Likely single-table cause First fix
ProvisionedThroughputExceededException Request exceeded provisioned (or burst) capacity Yes (SDK backs off) Hot key, or under-provisioned Shard the key; raise capacity / on-demand
ThrottlingException Control-plane / on-demand throttling Yes Sharp spike past warmed capacity Pre-warm; on-demand; jittered retry
ConditionalCheckFailedException A condition expression evaluated false No (re-read first) Optimistic-concurrency version moved Re-read, re-apply, retry
TransactionCanceledException A transaction was canceled Sometimes A per-item condition failed / item conflict Read CancellationReasons[]; narrow txn
TransactionConflictException Another txn touched the same item Yes (jitter) Contention on a hot item Shard contended item; backoff
ValidationException (item size) “Item size has exceeded the maximum allowed size” No An attribute grew past 400 KB Model increments as separate items
ValidationException (key) Key/attribute type or missing key No Wrong type (S vs N), absent key attr Fix the item shape / key definition
ItemCollectionSizeLimitExceededException An LSI item collection passed 10 GB No Too much data under one PK with LSIs Re-model; prefer GSIs over LSIs
ResourceInUseException Table/index busy (e.g. another GSI op) Yes (wait) Two GSI creates/deletes at once Serialize; one index op in flight
LimitExceededException An account/table limit hit (e.g. 20 GSIs) No Too many GSIs / concurrent ops Overload indexes; request a limit raise
ProvisionedThroughputExceeded on a GSI A GSI hit its own throughput Yes Under-provisioned GSI throttling base Size GSI WCU to base write rate

The expanded form, with the full reasoning for the entries that bite hardest:

1. ThrottledRequests while the table sits well below provisioned. Root cause: a hot partition — one partition key over the per-partition ~1,000 WCU / ~3,000 RCU ceiling. Adaptive capacity smooths skew but cannot exceed the per-partition limit. Confirm: CloudWatch Contributor Insights for DynamoDB surfaces the most-throttled partition key; ConsumedWriteCapacityUnits sits far below provisioned while ThrottledRequests > 0. Fix: write-shard the key (PK#0..#9 with a calculated suffix sized to required WCU / 1,000) so writes fan across physical partitions; reads scatter across the N shards and merge client-side.

2. ValidationException: Item size has exceeded the maximum allowed size. Root cause: an attribute — usually an append-style array (events, line items) — grew the whole item past the absolute 400 KB limit. Confirm: log the serialized item size immediately before the write; the offending item is near or over 400 KB. Fix: stop appending to one item; model each increment as its own item under an adjacency-list SK (MILESTONE#/ITEM#), and keep a small fixed-size “latest” summary item for cheap dashboard reads.

3. A query consumes far more RCU than the rows it returns. Root cause: the operation is a Scan + FilterExpression, which reads every item then discards most — capacity scales with table size, not result size. Confirm: --return-consumed-capacity TOTAL shows RCU vastly larger than the row count; the call is a Scan, not a Query. Fix: design keys or add an overloaded GSI so a KeyConditionExpression does the selection; every steady-state read must be one Query/GetItem.

4. Base-table writes throttle though base capacity looks healthy. Root cause: an under-provisioned GSI — under provisioned mode, a throttled index backpressures and throttles the base-table writes that touch its key attributes. Confirm: WriteThrottleEvents on the index dimension is non-zero while the base table’s consumed WCU is below provisioned. Fix: raise the GSI’s provisioned WCU to at least the base write rate that touches its keys (or share on-demand), and narrow the projection so fewer writes replicate.

5. A new GSI returns partial or empty results for historical data. Root cause: the backfill gap — a new GSI only contains items that already carry its key attributes; existing rows stay invisible until backfilled. Confirm: the index reports Backfilling=true / OnlineIndexPercentageProgress < 100; --select COUNT on old vs new index disagrees. Fix: do not query until ACTIVE; run an idempotent, throttled backfill (parallel Scan + UpdateItem with attribute_not_exists(GSIxPK)), or export-to-S3 + transform + BatchWriteItem.

6. ConditionalCheckFailedException on every retry of an update. Root cause: optimistic-concurrency conflict — another writer bumped the version attribute, so your version = :curv condition fails. Confirm: the item’s current version differs from the one you sent. Fix: re-read the item, re-apply your change on the new version, and retry; consider jittered backoff if conflicts are frequent.

7. TransactionCanceledException under load. Root cause: a transactional write failed because a per-item condition failed or two transactions collided on the same item. Confirm: read CancellationReasons[] in the error — each item reports ConditionalCheckFailed, TransactionConflict, or None. Fix: narrow the transaction to the items that truly need the invariant, add jittered retry, and reduce contention (shard the contended item or use optimistic concurrency for single-item updates).

8. The sparse-GSI “queue” grows forever and never drains. Root cause: the sparse key attribute is not removed on the state transition, so “done” items linger in the index. Confirm: --select COUNT on the index keeps rising; a completed item still carries GSIxPK. Fix: add REMOVE GSIxPK, GSIxSK to the transition UpdateItem (as in A7), so the item leaves the working set the moment it changes state.

Best practices

The alerts worth wiring before the next peak — the leading indicators, not the lagging “table throttling”:

Alert on Metric Threshold (starting point) Why it’s leading
Hot key Contributor Insights most-throttled key Any sustained single-key dominance Names the key before broad throttling
Write throttling WriteThrottleEvents (table + each GSI) > 0 sustained 5 min Catches a GSI bottleneck early
Read throttling ReadThrottleEvents > 0 sustained 5 min Hot read key or under-provisioned read
Consumed vs provisioned ConsumedWCU / provisioned < 50% while throttling The hot-partition signature
System errors SystemErrors (5xx) > 0 Distinguishes platform from your throttling
Conditional failures ConditionalCheckFailedRequests Rising trend Concurrency contention building

Security notes

The security controls that also prevent operational incidents — secure and resilient pull the same direction:

Control Mechanism Secures against Also prevents
KMS CMK encryption SSESpecification + key policy Plaintext-at-rest exposure Unauthorized restore from snapshots
Tenant isolation dynamodb:LeadingKeys condition Cross-tenant reads A tenant hot-keying another’s partition
Attribute-level scope dynamodb:Attributes condition Over-reading sensitive fields Accidental wide projections
VPC endpoint + policy Gateway/PrivateLink endpoint Public-internet exposure Data exfiltration paths
PITR Continuous backups Data loss / bad deploy Migration export depends on it
Least-privilege IAM Split read/write/admin roles Broad dynamodb:* blast radius A bad job running Scan/DeleteTable

Cost & sizing

The bill drivers and how they interact with the design:

A rough monthly picture for a small-to-mid production table (~25 GB, ~5M writes/day, ~20M reads/day, two GSIs with INCLUDE):

Cost driver What you pay for Rough INR / month What drives it up Lever to pull
On-demand writes Write request units ~₹3,000–6,000 Write volume × GSI count Narrow projections; fewer GSIs
On-demand reads Read request units (eventual = ½) ~₹1,500–3,000 Read volume; strong-consistent reads Use eventual reads where safe
Provisioned (alt.) WCU/RCU-hours + reserved ~₹2,000–4,000 steady Over-provisioning headroom Auto scaling + reserved floor
GSI storage Per-GB projected copies ~₹500–1,500 ALL projections; many GSIs KEYS_ONLY/INCLUDE; sparse
Streams Stream read request units ~₹300–800 Change rate × consumers Filter at the consumer
PITR + backups Per-GB-month continuous ~₹400–1,000 Table size Keep, it’s cheap insurance

Free-tier reality: DynamoDB’s perpetual free tier covers 25 GB of storage and 25 provisioned WCU + 25 RCU (enough for ~200M requests/month) on provisioned mode — a real production-grade allowance for small workloads. On-demand has no perpetual free allowance but is pay-per-use, so a low-traffic table costs pennies. The cheapest correct design is almost always “fewer, narrower GSIs + the right capacity mode,” not a bigger anything — the same lesson as the hot-partition fix: model it right and the bill follows.

Interview & exam questions

1. Why design DynamoDB schemas “backward” from access patterns instead of from entities? Because DynamoDB has no server-side join and no query planner — the only way to relate items cheaply is to co-locate them at write time. If you model entities first, you discover the queries your application needs require joins DynamoDB can’t do, and you refactor. Enumerating access patterns first lets you design keys that make each query a single-partition read.

2. What is entity overloading and why does single-table design depend on it? Naming the primary-key attributes generically (PK, SK) and encoding the entity type into the value with a prefix (CUST#, ORDER#, ITEM#), so multiple entity types share one table and one partition can hold a parent plus its children (an item collection). It’s the mechanism that lets a single Query return an aggregate, which is the whole point of single-table design.

3. Explain a sparse GSI and a real use for one. An item appears in a GSI only if it has both of that index’s key attributes — so if you write the GSI key only while an item is in a particular state and remove it on transition, the index holds exactly that working set. The canonical use is an “open orders” queue: index only OPEN orders, remove the key when they ship, and the ops dashboard queries a fraction of the data.

4. What are the per-partition throughput limits and why do they cause throttling at low table utilization? A single physical partition sustains roughly 1,000 WCU and 3,000 RCU. A hot key concentrates traffic on one partition and hits that ceiling even though the table’s total provisioned capacity is barely touched — so you see ThrottledRequests while ConsumedWriteCapacityUnits is at 40% of provisioned. The fix is write-sharding, not more capacity.

5. How does write-sharding work and what does it trade? You append a suffix (#0..#9) to the partition key so writes fan across multiple physical partitions; sized to required WCU / 1,000. A calculated suffix (hash of a key attribute) lets a point read recompute the exact shard; a random suffix maximizes spread but forces reads to scatter across all N shards and merge client-side. The trade is read fan-out for write throughput.

6. When do you choose on-demand vs provisioned capacity? By traffic shape. On-demand for new, spiky, or unpredictable workloads — it scales instantly and needs no planning. Provisioned + auto scaling (with a reserved floor for the baseline) for steady, predictable, diurnal load — it’s materially cheaper per request. You can only switch modes once per 24 hours, so it’s not a runtime knob.

7. What’s the difference between a FilterExpression and a KeyConditionExpression in cost terms? A KeyConditionExpression selects items by key before reading them, so you pay only for what you select. A FilterExpression runs after the key query and reduces the returned payload but not the capacity consumed or items examined. A pattern served only by Scan + FilterExpression reads the whole table and pays for it.

8. How do you model a many-to-many relationship in a single table? Materialize the edge as an item and store both directions by writing generic GSI keys that invert PK and SK. For users-in-groups, the membership item has PK=USER#u1, SK=GROUP#g1 and GSI1PK=GROUP#g1, GSI1SK=USER#u1: the base table answers “groups for a user,” GSI1 answers “users in a group” — one item, two query directions, no second write to keep in sync.

9. What does TransactWriteItems guarantee and what does it cost? All-or-nothing across up to 100 items (and multiple tables), each with its own condition; if any condition fails or it collides with another transaction on the same item, the whole thing is canceled (TransactionCanceledException with per-item reasons). It costs 2× the WCU of the equivalent non-transactional writes (prepare + commit). Use it for genuine multi-item invariants, not as a default.

10. How do you add a GSI to a live, high-traffic table without downtime or stale data? UpdateTable to create the GSI — it’s an online operation that backfills in the background while the table stays available; don’t query the index until it’s ACTIVE (watch OnlineIndexPercentageProgress). A new GSI only contains items that already carry its key attributes, so you run an idempotent, throttled backfill (parallel Scan + conditional UpdateItem), or export to S3, transform, and BatchWriteItem back to keep it off the live capacity.

11. What is the 400 KB limit and how do you design around it? It’s the absolute maximum size of a single item, including all attribute names and values. The classic violation is appending to an unbounded array (events, line items) until the rollup item crosses it and writes fail with ValidationException. Design around it by modeling each increment as its own item under an adjacency-list sort key, and keeping a small fixed-size summary item for cheap reads.

12. Why can an under-provisioned GSI throttle your base table? Under provisioned capacity mode, a GSI has its own throughput; if a write touches a projected attribute and the GSI can’t absorb the replicated write, that backpressure throttles the base-table write too. So an under-provisioned index becomes a write bottleneck for the whole table — size each GSI’s WCU to the base write rate that touches its keys, and project narrowly to reduce replication.

These map to the AWS Certified Developer – Associate (DVA-C02)develop solutions using DynamoDB, data modeling, GSIs, capacity, transactions — and the AWS Certified Solutions Architect – Associate (SAA-C03) and – Professional (SAP-C02) for the cost/capacity/scaling design trade-offs. The hot-partition and throughput mechanics also surface in the Data Engineer – Associate (DEA-C01). A compact cert-mapping for revision:

Question theme Primary cert Exam objective area
Access patterns, key/GSI design DVA-C02 Develop with DynamoDB; data modeling
Sparse indexes, projections DVA-C02 Optimize DynamoDB access
Hot partitions, write-sharding DEA-C01 / SAP-C02 Design for performance at scale
On-demand vs provisioned, auto scaling SAA-C03 Design cost-optimized, resilient storage
Transactions, condition expressions DVA-C02 Implement data consistency
Online GSI add, backfill, Streams DEA-C01 Operationalize data pipelines

Quick check

  1. A Query returns 12 items but --return-consumed-capacity TOTAL reports an RCU far larger than 12 rows would imply. What is almost certainly happening, and what’s the fix?
  2. You see ThrottledRequests > 0 while ConsumedWriteCapacityUnits sits at 40% of provisioned. Name the root cause and the one design change that fixes it.
  3. True or false: a Global Secondary Index can be read with strong consistency.
  4. An “open jobs” sparse GSI keeps growing and never shrinks even as jobs complete. What did the code forget to do?
  5. Your rollup item’s writes start failing with ValidationException: Item size has exceeded the maximum allowed size. What’s the cause and the re-modeling fix?

Answers

  1. The operation is really a Scan + FilterExpression (or a query reading far more than it returns), so you pay capacity for items read and then discarded. The fix is to make the keys do the selection — design the PK/SK or add an overloaded GSI so a KeyConditionExpression selects, turning it into a true single-partition Query.
  2. A hot partition — one partition key is over the per-partition ~1,000 WCU / ~3,000 RCU ceiling while the table’s total is barely used; adaptive capacity can’t exceed the per-partition limit. The fix is to write-shard that key (suffix #0..#9, sized to required WCU / 1,000) so writes fan across physical partitions.
  3. False. GSIs are eventually consistent only; there is no strongly-consistent GSI read. If you need strong consistency, GetItem the base-table item (or use an LSI, which supports strong reads on the same partition).
  4. It forgot to remove the sparse key attributes (REMOVE GSIxPK, GSIxSK) on the state transition. A sparse index keeps any item that still has both key attributes, so “done” items linger until you strip the keys.
  5. An attribute — typically an append-style array — has grown the whole item past the absolute 400 KB limit. Re-model by storing each increment as its own item under an adjacency-list sort key (ITEM#/MILESTONE#), and keep a small fixed-size “latest” summary item for the dashboard read.

Glossary

Next steps

You can now take an access-pattern list, derive an overloaded key schema and GSIs that serve every read in one call, and defend it against hot partitions and the 400 KB limit. Build outward:

awsdynamodbnosqldata-modelinggsi
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments