DynamoDB Single-Table Design: Modeling Access Patterns, GSIs, and Hot Partition Avoidance

Relational instincts are the single biggest reason DynamoDB projects go sideways. You normalize entities into tables, then discover the only join DynamoDB offers is the one you pre-compute at write time. Single-table design is the discipline of collapsing many entity types into one physical table so the queries your application runs become single-partition reads. It is not about saving on table count — it is about co-locating related items so a Query returns a parent and its children in one round trip, at single-digit-millisecond latency, regardless of table size.

This is the process I follow on every greenfield DynamoDB design: enumerate access patterns first, design keys to satisfy them, overload indexes, then defend the schema against hot partitions and the 400 KB item ceiling. The order matters. Start from the entities and you will refactor; start from the queries and you will ship. By the end you will be able to take an access-pattern list, derive a key schema and a set of overloaded GSIs that serve every read in one call, and prove the model against real consumed-capacity numbers rather than intuition.

This article is a reference you will return to mid-design and mid-incident, so the moving parts — key components, GSI projection choices, capacity-mode trade-offs, error/limit codes, and a symptom→cause→confirm→fix playbook — are all laid out as scannable tables alongside the prose and the aws/Terraform/Python snippets. Read the prose once to build the mental model, then keep the tables open when you are actually modeling.

What problem this solves

The pain DynamoDB single-table design addresses is specific: a NoSQL store that punishes you for thinking relationally. In a relational database you model the entities, normalize, and let the query planner figure out joins at read time. DynamoDB has no query planner and no server-side join. The only “join” is placement — items you decided at write time to store under the same partition key. If you model entities into separate tables the relational way, every screen that shows a parent and its children becomes two or three round trips, each its own network hop and its own capacity charge, and the “list all X for a Y” screens degrade into Scan operations that read the entire table and throw most of it away.

What breaks without this discipline: latency that climbs with table size instead of staying flat, capacity bills dominated by Scan-and-filter waste, and throttling that looks like insufficient capacity but is actually a hot key. Teams “fix” the throttling by raising provisioned capacity, the bill doubles, and the hot partition still throttles because the ceiling is per-partition, not per-table. Others discover the 400 KB item limit the day a rollup array finally crosses it in production, with a ValidationException and no graceful degradation.

Who hits this: anyone building a serverless or high-scale application on DynamoDB — multi-tenant SaaS, order/inventory systems, event logs, social graphs, session stores. It bites hardest on time-series workloads (the date-as-partition-key trap), mega-aggregate items (an order with thousands of line items, a shipment with thousands of scans), and any service whose access patterns were not written down before the keys were named. The fix is almost never “add capacity” — it is “make the keys do the selection the query needs.”

To frame the whole field before the deep dive, here is every failure class this article covers, the relational instinct that causes it, and the single-table move that prevents it:

Failure class	The relational instinct behind it	What it costs in production	The single-table move
Multi-round-trip reads	Normalize entities into separate tables	2–3 network hops + capacity per screen	Co-locate parent + children under one PK (item collection)
`Scan` creep	“Just filter the table” for a new pattern	RCU scales with table size, not result size	Design keys / add an overloaded GSI so `KeyConditionExpression` selects
Hot partition throttling	Key on something low-cardinality (date, status)	`ThrottledRequests` while table sits at 40%	Write-shard the PK to fan across physical partitions
400 KB item failure	Append to one growing attribute (array)	`ValidationException` with no fallback	Model each increment as its own item (adjacency list)
GSI write bottleneck	Treat indexes as free	Under-provisioned GSI throttles the base table	Size the GSI to base write rate; narrow the projection
Stale data after a new GSI	Assume an index sees existing rows	Queries silently miss history	Idempotent, throttled backfill before you query

Learning objectives

By the end of this article you can:

Produce a complete access-pattern list — every read and write with its filter, sort order, and cardinality — and treat it as the design artifact the key schema is derived from.
Compose a partition key and sort key with entity overloading and type prefixes so many entity types share one table and a single Query returns an aggregate.
Build Global Secondary Indexes that use index overloading, sparse-index semantics, and the right projection (KEYS_ONLY / INCLUDE / ALL) for each access shape.
Model one-to-many (adjacency list), hierarchies, and many-to-many relationships by placement and key-flips rather than runtime second reads.
Diagnose and eliminate hot partitions with write-sharding sized to required throughput, and reason about the per-partition ~1,000 WCU / ~3,000 RCU ceiling versus adaptive capacity.
Choose between on-demand and provisioned capacity by traffic shape, and pair provisioned with target-tracking auto scaling and a reserved floor.
Enforce write-time invariants with condition expressions, optimistic concurrency, and TransactWriteItems, and know what each costs.
Evolve a live schema without downtime using online GSI creation, idempotent throttled backfills, DynamoDB Streams, and export-to-S3 transforms.

Prerequisites & where this fits

You should already understand DynamoDB’s primitives: a table holds items (rows) made of attributes; an item is identified by a primary key that is either a single partition key or a composite partition key (PK) + sort key (SK); read capacity units (RCU) and write capacity units (WCU) meter throughput; and the API verbs are GetItem, PutItem, UpdateItem, DeleteItem, Query, Scan, BatchGetItem, BatchWriteItem, and TransactWriteItems/TransactGetItems. You should be comfortable running the AWS CLI, reading DynamoDB JSON (the {"S": "..."} typed form), and reasoning about eventual versus strong consistency.

This sits in the AWS data-modeling track and is the deep, prescriptive companion to the broader DynamoDB Deep Dive: Tables, Keys, Capacity, GSIs & Streams — that article surveys the service; this one is the schema-design craft. It pairs with DynamoDB Streams: Change Data Capture & Event-Driven Pipelines for the reshaping/reconciliation machinery, and the same partition-design principles transfer to Cosmos DB Partition Key Design & RU Optimization on Azure. Where this fits in the bigger picture: it is upstream of every serverless API you build on DynamoDB, because the key schema decides whether your Lambda handlers run one query or three.

A quick map of who owns which decision in a single-table design, so the right person reviews the right thing:

Decision layer	What is decided here	Who usually owns it	What goes wrong if skipped
Access-pattern list	Every read/write, filter, sort, cardinality	Product + backend lead	Schema gets refactored after launch
Key schema (PK/SK)	Overloading, prefixes, item collections	Data modeler / senior backend	Multi-round-trip reads; `Scan` creep
GSI design	Overloading, sparseness, projection	Data modeler	Wrong access shape; write bottleneck
Capacity mode	On-demand vs provisioned + auto scaling	Backend + FinOps	Over-pay or throttle under spike
Hot-key defense	Write-sharding, fan-out reads	Senior backend	Per-partition throttling at peak
Schema evolution	Online GSI add, backfill, Streams	Platform / data eng	Downtime or stale index data

Core concepts

Six mental models make every later decision obvious.

The schema is downstream of the queries. In DynamoDB you do not model entities and then query them; you enumerate the queries and then design keys that make each query a single-partition read. The access-pattern list — every read and write with its filter, sort order, and cardinality — is the artifact you review with the team. If you cannot serve an access pattern with one Query or GetItem, the schema is incomplete, not the application.

Filtering is not querying. A KeyConditionExpression selects items by key before reading them; a FilterExpression runs after the key query and before results return. Filtering reduces the payload you receive but never the capacity consumed or the items examined. Any pattern that can only be served by Scan + FilterExpression reads the whole table and pays for it — that is a modeling bug, not a tuning opportunity.

Entity overloading is the core trick. Name the keys generically (PK, SK) and encode the entity type into the value with a prefix (CUST#, ORDER#, ITEM#). Now different entity types coexist in one table, and a single partition can hold a parent row and all its child rows — an item collection — so one Query returns the aggregate. The same idea applied to a secondary index is index overloading: generic GSI1PK/GSI1SK that each entity populates with whatever it needs to be found by.

A GSI is an alternate, asynchronous view. A Global Secondary Index is a second (PK, SK) over the same items, maintained for you on every write with a small propagation delay (eventually consistent only). It has its own throughput and its own projected copy of attributes. Two GSI behaviors are load-bearing: a sparse index contains an item only if the item has both of that index’s key attributes (so you index a working set, not everything), and under provisioned mode an under-provisioned GSI can throttle the base table’s writes.

Partitions are physical and capped. DynamoDB hashes the partition key to choose a physical partition, and a single partition sustains roughly 1,000 WCU and 3,000 RCU. Exceed either on one key and you throttle even when table-level capacity is healthy. Adaptive capacity shifts throughput toward busy partitions and can isolate a single hot item, but it cannot exceed those per-partition limits — a key that needs more than ~1,000 WCU must spread across more than one partition-key value.

Hard limits are absolute. An item — all its attributes including names — cannot exceed 400 KB. A sort key value and a partition key value have their own size caps. A table allows at most 20 GSIs with one create/delete in flight at a time. These are not tunable; you model around them. The append-an-unbounded-array pattern is the classic 400 KB trap.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the model side by side:

Term	One-line definition	Where it lives	Why it matters to single-table design
Access pattern	One concrete read or write the app needs	Design doc	The thing the schema is derived from
Partition key (PK)	Hashed key choosing the physical partition	Item primary key	Decides co-location and hot-key risk
Sort key (SK)	Orders items within a partition	Item primary key	Enables range queries and adjacency lists
Entity overloading	Generic `PK`/`SK` + type prefixes	Key naming convention	Lets many entities share one table
Item collection	All items sharing one PK	One physical partition	One `Query` returns parent + children
GSI	Alternate async (PK, SK) over the items	Table-level index	Serves a different access shape
Index overloading	Generic `GSIxPK`/`GSIxSK` per entity	GSI key attributes	One index serves many logical patterns
Sparse index	Item indexed only if it has the keys	GSI semantics	Indexes a working set (e.g. open orders)
Projection	Which attributes a GSI copies	GSI definition	Drives index storage + write cost
Hot partition	One PK taking disproportionate traffic	Physical partition	Throttles at ~1,000 WCU / ~3,000 RCU
Write sharding	Suffixing the PK to fan writes	Key value	Spreads a high-write key across partitions
Adaptive capacity	Auto-shift of throughput to busy keys	Platform behavior	Smooths skew, can’t beat per-partition cap
Condition expression	A write that applies only if a predicate holds	Write request	Enforces invariants atomically
`TransactWriteItems`	All-or-nothing across ≤100 items	Write API	Multi-item invariants; costs 2× WCU

Limits and quotas you model around

These are the hard numbers single-table design is engineered against — none are tunable, so the schema absorbs them. Keep this open when you size keys, shards, projections, and transactions:

Limit / quota	Value	What it constrains	The single-table consequence
Item size	400 KB (names + values)	One item’s total bytes	Append-style data → separate items
Per-partition write	~1,000 WCU	Throughput on one physical partition	High-write keys must be sharded
Per-partition read	~3,000 RCU	Read throughput on one partition	Hot read keys must be sharded/cached
Partition key length	up to 2,048 bytes	PK value size	Keep prefixes short
Sort key length	up to 1,024 bytes	SK value size	Bounded path depth in hierarchies
GSIs per table	20	Number of alternate indexes	Overload indexes to stay under it
LSIs per table	5 (create-time only)	Local indexes	Rarely used in single-table design
LSI item-collection size	10 GB per PK	Total of a partition + its LSIs	A reason to prefer GSIs
`TransactWriteItems` items	100	Items per transaction	Big aggregates split or batch
`BatchWriteItem` items	25	Items per batch call	Loop/paginate large writes
`Query`/`Scan` page size	1 MB	Bytes returned per call	Paginate with `LastEvaluatedKey`
Provisioned throughput decrease	limited per day	Scale-down frequency	Plan auto-scaling min carefully
Mode switch (on-demand ↔ provisioned)	once / 24h	Capacity-mode changes	Not a runtime knob

1. Working backward: enumerate access patterns before touching keys

The schema is downstream of the queries. Before naming a single attribute, write the complete list of access patterns the service needs — every read and write, with its filter, its sort, and its cardinality. This is the artifact you review with the team, not the data model.

For a multi-tenant order-management service, the list looks like this:

#	Access pattern	Type	Frequency	Cardinality concern
A1	Get a customer by ID	Read	High	One item — safe
A2	Get all orders for a customer, newest first	Read	High	Bounded per customer
A3	Get a single order with its line items	Read	High	Bounded per order
A4	List orders in a status (e.g. `SHIPPED`) for a customer	Read	Medium	Bounded per customer
A5	Get all open orders across all customers (ops dashboard)	Read	Low	Cross-tenant — hot-key risk
A6	Create order + line items atomically	Write	High	Transaction (≤100 items)
A7	Update order status	Write	High	Single-item update

The cardinality column is not decoration. A5 — “all open orders across all customers” — will create a hot partition if modeled naively, because every write funnels into one item collection. Flag those now. Each access pattern then maps to a precise key construction, and writing that mapping table before you create the table is what prevents the post-launch refactor:

#	Pattern	Served by	Key expression	Index
A1	Customer by ID	`GetItem`	`PK=CUST#<id>`, `SK=PROFILE`	base
A2	Orders for a customer, newest first	`Query`	`PK=CUST#<id> AND begins_with(SK,"ORDER#")`, `ScanIndexForward=false`	base
A3	Order with line items	`Query`	`PK=ORDER#<id>`	base
A4	Orders in a status for a customer	`Query`	`GSI1PK=CUST#<id>#<status>`	GSI1
A5	All open orders	`Query`	`GSI2PK="OPEN"`	GSI2 (sparse)
A6	Create order + items	`TransactWriteItems`	per-item `attribute_not_exists(PK)`	base
A7	Update order status	`UpdateItem`	`PK=CUST#<id>`, `SK=ORDER#...` + `REMOVE GSI2*`	base

Three rules I hold to, and the consequence of breaking each:

Rule	Why it holds	What breaking it costs
No `Scan` in the steady state	`Scan` reads every item then filters	RCU scales with table size; latency grows unbounded
Filtering is not querying	`FilterExpression` runs after the key query	You pay capacity for discarded items
One `Query`/`GetItem` per pattern	A second call means a missing index/item	Latency doubles; consistency window widens

No Scan in the steady state. If a pattern can only be served by a Scan with a FilterExpression, the model is wrong. Scan reads every item and then filters, so you pay read capacity for data you discard.
Filtering is not querying. FilterExpression runs after the key query, before results return. It reduces payload, never capacity consumed or items examined. Design keys so the KeyConditionExpression does the selection.
Every pattern maps to exactly one Query/GetItem on the base table or a GSI. If a pattern needs two queries, you have a missing index or a missing pre-computed item.

2. Primary key design: partition/sort composition and entity overloading

DynamoDB gives you a composite primary key: a partition key (PK, decides the physical partition via an internal hash) and a sort key (SK, orders items within that partition). The power of single-table design comes from entity overloading — naming the keys generically (PK, SK) so different entity types can share the table, and encoding the type into the value with a prefix.

Here is the item collection that satisfies A1, A2, A3, and A6 — a customer and all of their orders and line items live under one partition key:

PK                  SK                       attributes
------------------- ------------------------ ----------------------------------
CUST#a1b2           PROFILE                  name, email, tier, createdAt
CUST#a1b2           ORDER#2026-06-01#o-9001  status=OPEN, total=149.00
CUST#a1b2           ORDER#2026-06-03#o-9044  status=SHIPPED, total=72.50
ORDER#o-9001        ITEM#001                 sku=ABC, qty=2, price=49.50
ORDER#o-9001        ITEM#002                 sku=XYZ, qty=1, price=50.00

The full key map for this model — every entity type and its base-table and GSI keys — is the single sheet you keep next to the code. This is “enumerate everything”: each row is one entity, and the prefixes are the contract the whole service shares:

Entity	PK	SK	GSI1PK	GSI1SK	GSI2PK	In sparse GSI2 when
Customer profile	`CUST#<id>`	`PROFILE`	—	—	—	never
Order (by customer)	`CUST#<id>`	`ORDER#<date>#<oid>`	`CUST#<id>#<status>`	`<date>#<oid>`	`OPEN`	status = OPEN
Order metadata (by order)	`ORDER#<oid>`	`META`	—	—	—	never
Line item	`ORDER#<oid>`	`ITEM#<seq>`	—	—	—	never
Payment	`ORDER#<oid>`	`PAYMENT#<ts>`	—	—	—	never
Address	`CUST#<id>`	`ADDR#<label>`	—	—	—	never
Membership (user↔group)	`USER#<uid>`	`GROUP#<gid>`	`GROUP#<gid>`	`USER#<uid>`	—	never
Category node	`CATALOG`	`CATEGORY#<path>`	—	—	—	never
Inventory level	`SKU#<sku>`	`STOCK`	—	—	`LOW`	qty < threshold
Audit event	`ORDER#<oid>`	`EVENT#<ts>`	—	—	—	never
Idempotency token	`IDEMP#<key>`	`LOCK`	—	—	—	never
Session	`SESSION#<sid>`	`META`	`USER#<uid>`	`SESSION#<sid>`	—	never

Two design choices do the heavy lifting:

Prefixes make sort keys range-queryable by type. A2 (“orders for a customer, newest first”) is PK = CUST#a1b2 AND begins_with(SK, "ORDER#"), with ScanIndexForward = false to reverse the sort. Because the order date is the first sortable component of the SK (ISO-8601), newest-first falls out for free.
Line items hang off the order, not the customer. A3 (“an order with its items”) is PK = ORDER#o-9001 — one query returns the order metadata row and its ITEM# rows, because they share a partition. This is the adjacency-list pattern.

Writing A6 (order plus line items, atomically) uses TransactWriteItems, covered in Section 7.

# A2: all orders for a customer, newest first (DynamoDB JSON via AWS CLI)
aws dynamodb query \
  --table-name app-main \
  --key-condition-expression "PK = :pk AND begins_with(SK, :prefix)" \
  --expression-attribute-values '{":pk":{"S":"CUST#a1b2"},":prefix":{"S":"ORDER#"}}' \
  --no-scan-index-forward

The key-condition operators you can use on a sort key — and the one you cannot — decide what queries the SK supports, so choose the SK structure against this table:

SK operator	Example	Use for	Note
`=`	`SK = "PROFILE"`	Exact child row	Single item
`begins_with`	`begins_with(SK,"ORDER#")`	All children of a type	The workhorse
`BETWEEN`	`SK BETWEEN "ORDER#2026-06-01" AND "ORDER#2026-06-30"`	Date/range slice	Needs coarse-to-fine SK
`<`, `<=`, `>`, `>=`	`SK > "ORDER#2026-06-01"`	One-sided range	Pagination boundaries
(none on PK)	—	PK is always `=`	You cannot range a PK

The data-type and encoding choices for keys are not cosmetic — they decide whether sorting and ranges behave. Pick deliberately:

Choice	Options	Pick when	Gotcha
PK/SK type	`S` (string), `N` (number), `B` (binary)	`S` for prefixed overloaded keys	`N` sorts numerically; `S` lexicographically
Timestamp format	ISO-8601 string vs epoch number	ISO-8601 in an `S` SK	Numbers as strings sort wrong (`"10" < "9"`)
Delimiter	`#`, `	`,` ~`	`#` (convention)
Component order	coarse → fine	Range/sort on the coarse part	Wrong order kills `BETWEEN`
Zero-padding	pad numeric components in `S`	Numbers embedded in string SKs	`item#7` sorts after `item#10` unpadded

A practical rule for sort-key composition: order the components from coarsest to finest, and only put something in the SK if you will range-query or sort on it. ORDER#<date>#<orderId> lets you filter a date range with BETWEEN; ORDER#<orderId>#<date> does not. The trade-offs across the common key shapes themselves:

Key shape	Co-location	Range queries	Hot-key risk	Best for
PK only (no SK)	None	None	Low (high cardinality)	Pure key/value lookups
PK + simple SK	Per-PK collection	Yes (on SK)	Depends on PK cardinality	Most entities
Overloaded PK + SK	Multi-entity collection	Yes, by prefix	Manage per entity	Single-table core
Low-cardinality PK	Everything in few partitions	Yes	High — throttles	Avoid; shard instead

3. Global secondary indexes: sparse indexes, index overloading, projections

The base table answers patterns keyed on the customer or the order. A4 and A5 need a different access shape — that is what a Global Secondary Index is for: an alternate (PK, SK) over the same items, maintained asynchronously on every write.

Before the techniques, the index-type decision itself — GSI versus LSI — is one you make once at table-design time and (for LSIs) can never undo:

Property	Global Secondary Index (GSI)	Local Secondary Index (LSI)
Partition key	Any attribute (different from base)	Same PK as base table
Sort key	Any attribute	Alternate SK, same PK
When creatable	Any time (online)	Only at table creation
Max per table	20	5
Consistency	Eventual only	Strong or eventual
Throughput	Its own (or shared on-demand)	Shares the table’s
Item-collection size cap	None	10 GB per PK
Single-table fit	The default choice	Rare; the 10 GB cap bites

Three GSI techniques carry single-table design:

Index overloading. Add generic attributes GSI1PK / GSI1SK and let each entity type populate them with whatever it needs to be found by. One physical index serves many logical patterns. For A4 (“orders in a status for a customer”), order items set:

GSI1PK = CUST#a1b2#SHIPPED      GSI1SK = 2026-06-03#o-9044

A4 becomes Query on GSI1 with GSI1PK = CUST#a1b2#SHIPPED.

Sparse indexes. An item appears in a GSI only if it has both of that index’s key attributes — a feature, not a limitation. For A5 (“all open orders across all customers”), do not index every order, only OPEN ones. Write GSI2PK = "OPEN" only while the order is open, and remove the attribute when it ships. The index then holds exactly the working set of open orders, so the ops query touches a fraction of the data. This is the canonical sparse-index pattern: a queue you Query by presence.

# A7: status -> SHIPPED, which REMOVES the item from the sparse "open orders" GSI
aws dynamodb update-item \
  --table-name app-main \
  --key '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-01#o-9001"}}' \
  --update-expression "SET #s = :shipped REMOVE GSI2PK, GSI2SK" \
  --expression-attribute-names '{"#s":"status"}' \
  --expression-attribute-values '{":shipped":{"S":"SHIPPED"}}'

Projection choices. A GSI stores a copy of attributes, billed as extra storage and extra write capacity on every base-table write that touches a projected attribute. Choose deliberately:

Projection	What it stores	Storage / write cost	Use when	One-way?
`KEYS_ONLY`	Index + base keys only	Lowest	You only need the key, then `GetItem`	Widen later via new GSI
`INCLUDE`	Keys + a named attribute list	Moderate (the list)	Project exactly what the query reads	Cannot shrink in place
`ALL`	Every attribute	Highest	Query genuinely needs the whole item	Cannot shrink in place

KEYS_ONLY — index plus base key attributes only. Smallest, cheapest. Use when you only need to find the key, then GetItem the full record.
INCLUDE — keys plus a named list. The right default: project exactly what the index’s queries read.
ALL — every attribute. Most expensive in storage and write cost. Reserve it for indexes whose queries genuinely need the whole item.

Project narrowly and widen later: you can create a new GSI online, but you cannot shrink a projection in place. ALL on a wide, hot item is a line item you will see on the bill. The three GSI techniques mapped to the access patterns they unlock:

Technique	What it does	Access pattern it serves	Cost lever
Index overloading	Generic `GSIxPK/SK` per entity	Many logical patterns on one index	Stays within the 20-GSI cap
Sparse index	Index only items that have the keys	Working-set queues (open orders)	Index holds a fraction of items
Narrow projection	Copy only needed attributes	The query’s read set	Lower storage + replication WCU

GSIs have their own provisioned throughput (or share the table’s on-demand capacity). Critically, under provisioned mode, if a GSI is throttled, base-table writes throttle too — an under-provisioned index becomes a write bottleneck for the whole table. The GSI behaviors that surprise people, and how to keep them from biting:

GSI behavior	The surprise	How to handle it
Eventually consistent only	No strongly-consistent GSI read exists	`GetItem` the base item if you need strong consistency
Separate throughput	Under-provision → base writes throttle	Size GSI WCU ≥ base write rate touching its keys
Projection replication	Every projected-attr write costs index WCU	Project narrowly; avoid `ALL` on hot items
Sparse by key presence	Item silently absent if a key attr is missing	Set/remove the key attribute deliberately
Backfill on creation	New GSI ignores existing rows	Backfill before querying (Section 8)

4. Modeling relationships: adjacency lists, hierarchies, many-to-many

Single-table design models relationships by placement, not by joins. The three relationship cardinalities each have a canonical encoding:

Relationship	Relational answer	DynamoDB encoding	Read it with
One-to-many	Foreign key + join	Parent + children share a PK (adjacency list)	One `Query` on the PK
Hierarchy / tree	Recursive self-join	Path in the SK (`A#B#C`)	`begins_with` on the path prefix
Many-to-many	Join table	Materialize edge + flip with a GSI	Base for one direction, GSI for the other

One-to-many (adjacency list). Already shown: parent and children share a partition (ORDER#o-9001 owns its ITEM# rows). One Query returns the aggregate.

Hierarchies. Encode the path in the sort key. A category tree — Electronics > Audio > Headphones — stores SK = CATEGORY#Electronics#Audio#Headphones. begins_with(SK, "CATEGORY#Electronics#Audio") returns the whole subtree in one query, because lexicographic ordering on the delimited path mirrors the tree.

Many-to-many. The relational answer is a join table; the DynamoDB answer is to materialize both directions of the edge and flip them with a GSI. For users-in-groups: store a membership item, then use GSI1 to invert PK and SK.

PK              SK              GSI1PK          GSI1SK
--------------- --------------- --------------- ---------------
USER#u1         GROUP#g1        GROUP#g1        USER#u1
USER#u1         GROUP#g2        GROUP#g2        USER#u1
USER#u2         GROUP#g1        GROUP#g1        USER#u2

“Which groups is user u1 in?” -> base table, PK = USER#u1 AND begins_with(SK, "GROUP#").
“Which users are in group g1?” -> GSI1, GSI1PK = GROUP#g1.

One item, two access directions, no second write to keep in sync. When the relationship carries denormalized data (a group name shown on the user’s view), accept the duplication and reconcile it with DynamoDB Streams (Section 8) rather than reading two items per query. Denormalization is a deliberate trade — copy data to save a read, then keep the copies honest:

Denormalization decision	Read-time benefit	Write-time cost	Reconcile with
Copy group name onto membership item	No second read for the label	Update fan-out on rename	Streams Lambda updates copies
Store order total on customer row	Dashboard avoids summing items	Update on every item change	Transaction or Streams
Duplicate edge both directions	Both query directions are one call	Two writes (or one + GSI flip)	GSI flip needs no extra write
Keep a small “latest” summary item	Cheap dashboard read	One extra write per change	Streams maintains it

5. Write sharding to avoid hot partitions and throttling

DynamoDB spreads data across partitions by hashing the partition key. Two failure modes follow: a single partition key taking disproportionate traffic (a hot key), and the hard physical ceiling — a single partition sustains roughly 1,000 WCU and 3,000 RCU. Exceed either and you throttle, even if table-level capacity looks healthy.

Adaptive capacity helps but does not excuse key design. DynamoDB shifts capacity toward busy partitions and can isolate a single hot item, but it cannot exceed those per-partition limits. A key that needs more than 1,000 WCU must spread across multiple physical partitions — which means more than one partition-key value.

Time-series keys are the classic trap. A PK of the current date sends every write today to one partition; yesterday’s is cold. If you must key on time, write-shard: append a calculated suffix to fan writes across N logical partitions.

import hashlib

SHARDS = 10  # tune to required WCU / 1000, rounded up

def shard_suffix(item_id: str, shards: int = SHARDS) -> int:
    # Deterministic so the read side can recompute it
    h = hashlib.md5(item_id.encode()).hexdigest()
    return int(h, 16) % shards

# write: PK = "ORDER#2026-06-08#7"  (date + shard)
pk = f"ORDER#2026-06-08#{shard_suffix('o-9001')}"

The tradeoff is explicit: reading all of today’s orders now means N queries (PK = ORDER#2026-06-08#0 … #9) merged client-side. Sharding trades read fan-out for write throughput. Two ways to pick the suffix:

Suffix strategy	How it is computed	Read side	Best for
Calculated	Hash of a key attribute mod N	Recompute the exact shard for a point read	Read-by-ID workloads
Random	`random.randint(0, N-1)`	Scatter across all N shards	Write-then-batch-read workloads

Calculated suffix (above) — deterministic from a key attribute, so a point read recomputes the exact shard. Best when you read by ID.
Random suffix — random.randint(0, N-1). Maximizes spread, but you can only read by scattering across all N shards. Best for pure write-then-batch-read workloads.

Sizing the shard count is arithmetic, not guesswork — pick N from the peak write rate of the hottest key:

Required WCU on one logical key	Min shards (WCU / 1000, rounded up)	Read fan-out cost	Note
≤ 1,000	1 (no shard)	1 query	A single partition suffices
~2,500	3	3 queries merged	Round up, leave headroom
~9,000	10	10 queries merged	The common default
~25,000	25	25 queries merged	Reconsider the key entirely

The hot-partition symptoms and how to read them apart from genuine under-provisioning:

Signal	Hot partition	Genuinely under-provisioned
`ThrottledRequests`	> 0	> 0
`ConsumedWriteCapacityUnits` vs provisioned	Well below provisioned	At/above provisioned
Contributor Insights top key	One key dominates	Traffic spread evenly
Fix that works	Write-shard the key	Raise capacity / on-demand

Diagnose hot partitions with CloudWatch Contributor Insights for DynamoDB, which surfaces the most-accessed partition keys. ThrottledRequests on a table that is nowhere near its provisioned total is the signature of a hot key, not insufficient capacity.

6. Capacity modes: on-demand vs provisioned with auto scaling

Two billing models, and the choice is about traffic shape, not just volume.

On-demand bills per request, scales instantly, and needs zero capacity planning. It keeps prior peaks warm so it can double instantly from the previous high-water mark, but a genuine cold 10x spike can still throttle for a moment. Use it for new tables (unknown traffic), spiky workloads, and dev/test.

Provisioned reserves RCU/WCU and is materially cheaper per request for steady, predictable load — but you pay for that capacity whether you use it or not. Pair it with Application Auto Scaling, which tracks a target utilization (typically 70%) between a min and max. It reacts on a CloudWatch alarm timescale (a minute or two): good for gentle diurnal curves, poor at absorbing sharp spikes.

Dimension	On-demand	Provisioned + auto scaling
Billing	Per request (RRU/WRU)	Per provisioned RCU/WCU-hour
Capacity planning	None	Set min/max + target %
Spike response	Instant up to ~2× prior peak	Minutes (alarm-driven)
Cost at steady high load	Higher per request	Lower (esp. with reserved)
Cost at low/idle	Pay only for use	Pay the provisioned floor
Best for	New, spiky, dev/test	Predictable, steady, diurnal
Switch frequency	—	Once per 24h between modes

# Provisioned table with target-tracking auto scaling (Terraform)
resource "aws_dynamodb_table" "main" {
  name         = "app-main"
  billing_mode = "PROVISIONED"
  hash_key     = "PK"
  range_key    = "SK"
  read_capacity  = 50
  write_capacity = 50

  attribute { name = "PK" type = "S" }
  attribute { name = "SK" type = "S" }
}

resource "aws_appautoscaling_target" "write" {
  service_namespace  = "dynamodb"
  resource_id        = "table/${aws_dynamodb_table.main.name}"
  scalable_dimension = "dynamodb:table:WriteCapacityUnits"
  min_capacity       = 50
  max_capacity       = 2000
}

resource "aws_appautoscaling_policy" "write" {
  name               = "write-target-70"
  service_namespace  = aws_appautoscaling_target.write.service_namespace
  resource_id        = aws_appautoscaling_target.write.resource_id
  scalable_dimension = aws_appautoscaling_target.write.scalable_dimension
  policy_type        = "TargetTrackingScaling"

  target_tracking_scaling_policy_configuration {
    target_value = 70.0
    predefined_metric_specification {
      predefined_metric_type = "DynamoDBWriteCapacityUtilization"
    }
  }
}

The auto-scaling knobs and sensible starting points, so the table reacts without thrashing:

Auto-scaling setting	What it controls	Starting point	Trade-off
Target utilization	Headroom above current use	70%	Lower = more headroom, more cost
Min capacity	The floor (buy as reserved)	Your baseline	Too low → throttle on the ramp
Max capacity	Hard ceiling	2–4× baseline	Too low → throttle at peak
Scale-out cooldown	Wait before scaling up again	Short (seconds–1 min)	Too long → slow to absorb a ramp
Scale-in cooldown	Wait before scaling down	Longer (minutes)	Too short → flap on jitter

For a stable baseline, reserved capacity discounts that floor in exchange for a one- or three-year commitment — buy it for the min, let auto scaling handle the rest. You can switch a table between on-demand and provisioned only once every 24 hours, so the mode is not a runtime knob. Match the capacity mode to the traffic shape with this decision table:

If your traffic is…	Then choose…	Because
Brand new / unknown	On-demand	No data to size provisioned from
Spiky / unpredictable	On-demand	Instant scale, no throttle on bursts
Steady with a diurnal curve	Provisioned + auto scaling	Cheaper per request, scaling rides the curve
Steady with a known floor	Provisioned + reserved for the floor	Reserved discount on guaranteed baseline
Flash-sale / sharp 10× spikes	On-demand (or pre-scaled provisioned)	Auto scaling can’t react fast enough

7. Transactions, condition expressions, and optimistic concurrency

A PutItem is atomic for one item and immediately visible to strongly-consistent reads on the base table. The harder guarantees come from three tools.

Condition expressions make a write conditional and reject it atomically otherwise. The most important one prevents blind overwrites: attribute_not_exists(PK) makes PutItem an insert, failing with ConditionalCheckFailedException if the item already exists. The functions and operators you compose conditions from:

Condition function / operator	Meaning	Canonical use
`attribute_not_exists(PK)`	Item/attr does not exist	Insert-only (no overwrite)
`attribute_exists(PK)`	Item/attr exists	Update-only (must already be there)
`attribute_type(a, :t)`	Attribute is of a type	Defensive schema checks
`begins_with(a, :p)`	String prefix	Guard on a structured value
`<`, `<=`, `=`, `>=`, `>`, `<>`	Comparisons	Version / counter guards
`AND`, `OR`, `NOT`	Boolean composition	Multi-condition guards

Optimistic concurrency uses a version attribute so a lost update is rejected rather than silently clobbered:

# Update only if version is unchanged; bump it in the same call
aws dynamodb update-item \
  --table-name app-main \
  --key '{"PK":{"S":"ORDER#o-9001"},"SK":{"S":"META"}}' \
  --update-expression "SET #st = :new, version = :nextv" \
  --condition-expression "version = :curv" \
  --expression-attribute-names '{"#st":"status"}' \
  --expression-attribute-values '{":new":{"S":"PAID"},":curv":{"N":"7"},":nextv":{"N":"8"}}'

If another writer bumped version to 8 first, the condition version = 7 fails; you re-read and retry. No locks, no contention beyond the conflicting writers.

TransactWriteItems gives all-or-nothing across up to 100 items (and multiple tables), each with its own condition. This is how A6 inserts an order and its line items atomically:

{
  "TransactItems": [
    { "Put": {
        "TableName": "app-main",
        "Item": {"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-08#o-9100"},"status":{"S":"OPEN"}},
        "ConditionExpression": "attribute_not_exists(PK)"
    }},
    { "Put": {
        "TableName": "app-main",
        "Item": {"PK":{"S":"ORDER#o-9100"},"SK":{"S":"ITEM#001"},"sku":{"S":"ABC"}}
    }}
  ]
}

Two costs to internalize: a transactional write consumes 2x the WCU of the same non-transactional write (prepare plus commit), and a transaction fails entirely if any condition fails or it collides with another transaction on the same item (TransactionCanceledException, with per-item reasons). The four write-consistency tools, side by side, so you reach for the cheapest one that gives the guarantee:

Tool	Guarantee	Cost	Use when
Plain `PutItem`/`UpdateItem`	Single-item atomic	1× WCU	No cross-item invariant
Condition expression	Atomic conditional (insert/guard)	1× WCU (failed write still charges)	Prevent overwrite / enforce a predicate
Optimistic concurrency (version)	No lost update	1× WCU + retry on conflict	Concurrent updates to one item
`TransactWriteItems`	All-or-nothing across ≤100 items	2× WCU	Multi-item invariant (order + items)

Use transactions where you need the invariant; do not wrap every write in one.

8. Migrations and backfills: evolving the schema without downtime

Single-table schemas evolve constantly — a new access pattern means a new GSI or a derived attribute. DynamoDB is schemaless at the item level, so adding attributes needs no migration. The work is in indexes and backfills.

Adding a GSI is an online operation. UpdateTable with a GSI create returns immediately; DynamoDB backfills it in the background while the table stays fully available. The index reports CREATING then ACTIVE — do not query it until ACTIVE, and watch OnlineIndexPercentageProgress. A table allows at most 20 GSIs, with only one create or delete in flight at a time.

aws dynamodb update-table \
  --table-name app-main \
  --attribute-definitions \
      AttributeName=GSI3PK,AttributeType=S AttributeName=GSI3SK,AttributeType=S \
  --global-secondary-index-updates '[{
    "Create": {
      "IndexName": "GSI3",
      "KeySchema": [
        {"AttributeName":"GSI3PK","KeyType":"HASH"},
        {"AttributeName":"GSI3SK","KeyType":"RANGE"}
      ],
      "Projection": {"ProjectionType":"INCLUDE","NonKeyAttributes":["status","total"]}
    }
  }]'

A new GSI only contains items that already carry GSI3PK/GSI3SK. Existing items stay invisible to it until you write those attributes — that is the backfill. The migration techniques, and when each is the right tool:

Technique	Touches live capacity?	Best for	Watch-out
Add attributes (schemaless)	Minimal	New optional fields	Old items lack the field
Online GSI create	Background backfill	New access pattern on new attrs	Query only after `ACTIVE`
Parallel `Scan` + `UpdateItem`	Yes — rate-limit it	Backfilling new key attrs	Throttle; make idempotent
DynamoDB Streams → Lambda	Incremental, ongoing	Continuous reshaping/denorm	At-least-once; dedupe
Export to S3 + transform + `BatchWriteItem`	No (export is free of RCU)	One-time bulk transform	Re-import path; eventual cutover

Backfill with a throttled job, not a Scan-and-update loop that melts capacity. Pattern: parallel Scan with Segment/TotalSegments, transform each item, UpdateItem the new attributes, rate-limited against provisioned capacity. AWS Glue, Step Functions, or a Lambda fan-out are the usual harnesses. Make the transform idempotent (a condition like attribute_not_exists(GSI3PK) so re-runs skip done items) and write-shard the target if the new key would be hot.

For continuous reshaping, use DynamoDB Streams. A Lambda reacts to every change to keep a denormalized copy or new index attribute current — the same machinery that reconciles the many-to-many duplication from Section 4. For a one-time bulk transform across a huge table, export to S3 (a point-in-time export that consumes no read capacity), transform with Athena or Glue, and BatchWriteItem the result back, keeping the migration entirely off the live table’s capacity. A backfill is correct only if it is safe to re-run — the idempotency checklist:

Backfill property	Why it matters	How to ensure it
Idempotent	Re-runs and overlaps must not double-apply	Condition `attribute_not_exists(GSI3PK)`
Rate-limited	A full-speed `Scan` melts provisioned capacity	Cap WCU/RCU; use `Limit`; back off on throttle
Reconcilable	You must prove it finished	`--select COUNT` old vs new agrees
Resumable	Big tables take hours	Segment-based parallel `Scan` checkpoints
Off-path for huge tables	Don’t compete with live traffic	Export to S3, transform, re-import

Architecture at a glance

The diagram traces a single-table design the way the data and control actually move, left to right, and pins the five classic failure points onto the exact node where each bites. Start at the left: your service code holds the access-pattern list (A1–A7) and issues Query/GetItem (never Scan) against the base table app-main, whose overloaded PK=CUST#/SK=ORDER# keys let a customer, their orders, and the orders’ line items share one item collection so a single query returns the aggregate. Reads that need a different shape hit the secondary indexes — an overloaded GSI1 (CUST#STATUS, INCLUDE projection) and a sparse GSI2 holding only OPEN orders as a working-set queue. Writes flow down the write path: high-volume keys go through a write-shard (#0..#9) to fan across physical partitions, multi-item invariants use TransactWriteItems (≤100 items, 2× WCU), and everything is encrypted at rest with a KMS CMK. Finally, DynamoDB Streams drives a Lambda that reconciles duplicates and backfills new GSI keys, while CloudWatch Contributor Insights watches for hot keys and ThrottledRequests.

Read the numbered badges as the diagnostic map laid over that architecture. Badge 1 sits on the physical partition — the per-partition ~1,000 WCU / ~3,000 RCU ceiling where a hot key throttles while the table looks idle. Badge 2 sits on the sparse GSI, where an under-provisioned index throttles the base table’s writes. Badge 3 sits on the item collection, the 400 KB-per-item limit you hit by appending to an unbounded array. Badge 4 sits on the Streams/ETL node, the backfill gap where a new GSI silently misses historical rows. Badge 5 sits back on the service code, where a new access pattern degrades into a Scan. The legend narrates each as symptom · confirm · fix — the same method as the playbook section below: localize the failure to a node, confirm with the named metric or exception, apply the keys-or-capacity fix.

Real-world scenario

Northwind Logistics runs a parcel-tracking platform on a single DynamoDB table tracking-main, keyed PK = SHIPMENT#<id>, SK = EVENT#<timestamp>, with a customer-and-shipments item collection alongside. On-demand capacity, point-in-time recovery on, a sparse GSI for “in-transit shipments,” and a Streams Lambda feeding an OpenSearch index for the support console. Average load is 3,000 writes/second of scan events; the data team is three engineers and the table had run clean for two years.

Peak season broke it on a single Monday. Two failures hit at once. First, a handful of mega-shipments — palletized freight with tens of thousands of scanned parcels — accumulated their events under one PK = SHIPMENT#<id>, and those partitions began throwing ThrottledRequests while the table sat at roughly 40% of its on-demand high-water mark. The on-call engineer’s reflex was to assume under-provisioning and consider raising limits — but the table was nowhere near its ceiling. Contributor Insights told the truth: three shipment IDs dominated the most-throttled-key list. This was the per-partition ~1,000 WCU ceiling on a hot key, not table capacity. Second, a per-shipment rollup item that appended each milestone to a events array started failing writes with ValidationException: Item size has exceeded the maximum allowed size — it had finally crossed 400 KB.

The breakthrough was naming the two failures precisely instead of reaching for the capacity slider. The hot partition was a key-design problem; the 400 KB error was a modeling problem. Neither is fixed by more capacity. The team confirmed the hot key with Contributor Insights and the consumed-vs-provisioned gap, and confirmed the item-size failure by logging item bytes before each rollup write.

The fix landed in two changes, both deployable without downtime. First, write-shard the event partition for high-volume shipments only — PK = SHIPMENT#<id>#<shard> with a calculated suffix from the event ID, fanning hot shipments across 10 partitions while small shipments stayed single-partition so their reads remained one query. Full-history reads for a mega-shipment became 10 parallel queries merged client-side — acceptable because that read was rare and the per-event writes were the hot path.

import hashlib

# Shard only high-volume shipments; keep small ones single-partition
# so their reads stay a single query.
def event_pk(shipment_id: str, event_id: str, high_volume: bool) -> str:
    if not high_volume:
        return f"SHIPMENT#{shipment_id}"
    shard = int(hashlib.md5(event_id.encode()).hexdigest(), 16) % 10
    return f"SHIPMENT#{shipment_id}#{shard}"

Second, they stopped appending to the rollup array. Each milestone became its own item under the adjacency-list SK (SK = MILESTONE#<seq>), sidestepping the 400 KB ceiling entirely, while a Streams Lambda maintained a small fixed-size “latest status” summary item for the dashboard read. The next peak ran at 3,400 writes/second with zero ThrottledRequests and no ValidationException, and because the writes spread evenly the on-demand bill actually dropped slightly versus the throttled-and-retrying weeks before.

The incident as a timeline, because the order of moves is the lesson:

Time	Symptom	Action taken	Effect	What it should have been
Mon 09:00	`ThrottledRequests` climbing	(alert fires)	—	Ask: hot key or under-provisioned?
09:10	Throttling at 40% of peak	Considered raising limits	Would not have helped	Check consumed vs provisioned first
09:25	Still throttling	Opened Contributor Insights	3 shipment IDs dominate	This was the breakthrough
09:40	Rollup writes failing	Logged item bytes pre-write	Item > 400 KB confirmed	Two distinct root causes named
11:00	Mitigated	Write-shard high-volume shipments	Hot partitions clear	Correct day-of fix
+3 days	Fixed	Milestones as items + Streams summary	0 throttles, 0 size errors	The actual fix is modeling

The lesson on the wall: adaptive capacity had silently smoothed the skew for two years, so the team assumed the key design was fine. Hot-partition risk is a function of the busiest key, not the average — and it stays invisible until the day it isn’t.

Advantages and disadvantages

Single-table design both enables DynamoDB’s single-digit-millisecond reads at any scale and demands discipline most relational engineers have to unlearn. Weigh it honestly:

Advantages (why it wins)	Disadvantages (why it bites)
One `Query` returns a parent and all its children — no joins, flat latency at any table size	The model is rigid: a new access pattern can mean a new GSI or a backfill, not a free ad-hoc query
Co-located item collections eliminate multi-round-trip reads	Item collections concentrate writes — co-location is also hot-key risk
Overloaded GSIs serve many patterns within the 20-index cap	The keys are opaque (`PK`/`SK` with prefixes) — harder to read than named columns
Sparse indexes hold only the working set, so ops queries touch a fraction of data	Forgetting to remove a sparse key leaves stale items in the index
Capacity is per-request and predictable; you pay for the traffic shape you have	Per-partition ~1,000 WCU / ~3,000 RCU ceiling is invisible until a hot key hits it
Transactions and condition expressions give write-time invariants without locks	Transactions cost 2× WCU and fail the whole batch on one conflict
Schemaless items make adding attributes free	The 400 KB item limit is absolute — append-style data must be re-modeled

The model is right for high-scale, well-understood workloads where the access patterns are knowable up front and read latency must stay flat as data grows — SaaS, commerce, event logs, graphs. It is the wrong default for exploratory analytics, ad-hoc reporting, or workloads whose query shapes change weekly; those want a relational store or a lakehouse, and DynamoDB feeds them via Streams or S3 export. The disadvantages are all manageable — but only if you know they exist before you name the first key, which is the entire point of working backward from access patterns.

Hands-on lab

Build the order-management model end to end, prove every access pattern is a single Query/GetItem, watch the sparse GSI hold only the working set, and confirm consumed capacity — all on-demand and free-tier-friendly (delete at the end). Run in any shell with the AWS CLI configured.

Step 1 — Create the table with two overloaded GSIs (on-demand).

aws dynamodb create-table \
  --table-name app-main \
  --attribute-definitions \
    AttributeName=PK,AttributeType=S AttributeName=SK,AttributeType=S \
    AttributeName=GSI1PK,AttributeType=S AttributeName=GSI1SK,AttributeType=S \
    AttributeName=GSI2PK,AttributeType=S AttributeName=GSI2SK,AttributeType=S \
  --key-schema AttributeName=PK,KeyType=HASH AttributeName=SK,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST \
  --global-secondary-indexes '[
    {"IndexName":"GSI1","KeySchema":[{"AttributeName":"GSI1PK","KeyType":"HASH"},{"AttributeName":"GSI1SK","KeyType":"RANGE"}],"Projection":{"ProjectionType":"INCLUDE","NonKeyAttributes":["status","total"]}},
    {"IndexName":"GSI2","KeySchema":[{"AttributeName":"GSI2PK","KeyType":"HASH"},{"AttributeName":"GSI2SK","KeyType":"RANGE"}],"Projection":{"ProjectionType":"KEYS_ONLY"}}
  ]'
aws dynamodb wait table-exists --table-name app-main

Expected: the command returns table metadata; wait blocks until ACTIVE.

Step 2 — Seed a customer, two orders, and line items (entity overloading).

aws dynamodb put-item --table-name app-main --item '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"PROFILE"},"name":{"S":"Acme Co"},"tier":{"S":"GOLD"}}'
aws dynamodb put-item --table-name app-main --item '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-01#o-9001"},"status":{"S":"OPEN"},"total":{"N":"149.00"},"GSI1PK":{"S":"CUST#a1b2#OPEN"},"GSI1SK":{"S":"2026-06-01#o-9001"},"GSI2PK":{"S":"OPEN"},"GSI2SK":{"S":"2026-06-01#o-9001"}}'
aws dynamodb put-item --table-name app-main --item '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-03#o-9044"},"status":{"S":"SHIPPED"},"total":{"N":"72.50"},"GSI1PK":{"S":"CUST#a1b2#SHIPPED"},"GSI1SK":{"S":"2026-06-03#o-9044"}}'
aws dynamodb put-item --table-name app-main --item '{"PK":{"S":"ORDER#o-9001"},"SK":{"S":"ITEM#001"},"sku":{"S":"ABC"},"qty":{"N":"2"}}'

Note the SHIPPED order has no GSI2PK — that is the sparse index doing its job.

Step 3 — A2: all orders for the customer, newest first.

aws dynamodb query --table-name app-main \
  --key-condition-expression "PK = :pk AND begins_with(SK, :p)" \
  --expression-attribute-values '{":pk":{"S":"CUST#a1b2"},":p":{"S":"ORDER#"}}' \
  --no-scan-index-forward --return-consumed-capacity TOTAL

Expected: two order items, the 2026-06-03 one first; ConsumedCapacity a fraction of an RCU.

Step 4 — A3: an order with its line items (adjacency list).

aws dynamodb query --table-name app-main \
  --key-condition-expression "PK = :pk" \
  --expression-attribute-values '{":pk":{"S":"ORDER#o-9001"}}'

Expected: the ITEM#001 row returned by one query keyed on the order.

Step 5 — A5: all OPEN orders via the sparse GSI, COUNT only.

aws dynamodb query --table-name app-main --index-name GSI2 \
  --key-condition-expression "GSI2PK = :open" \
  --expression-attribute-values '{":open":{"S":"OPEN"}}' --select COUNT

Expected: Count = 1 — only the OPEN order is in the index, proving sparseness.

Step 6 — A7: ship the order and watch it leave the sparse GSI.

aws dynamodb update-item --table-name app-main \
  --key '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-01#o-9001"}}' \
  --update-expression "SET #s = :sh REMOVE GSI2PK, GSI2SK" \
  --expression-attribute-names '{"#s":"status"}' \
  --expression-attribute-values '{":sh":{"S":"SHIPPED"}}'
# Re-run Step 5: Count is now 0 — the item dropped out of the working set.

Validation checklist. You modeled multiple entities in one table, served three different access patterns with single Query calls, saw the sparse GSI hold exactly the open working set, and watched a status update remove an item from that index — all without a Scan. The lab steps mapped to what each proves:

Step	What you did	What it proves	Real-world analogue
1	Create table + 2 overloaded GSIs	One table serves many access shapes	Greenfield schema bring-up
2	Seed customer/orders/items	Entity overloading co-locates entities	Modeling the domain
3–4	Query by customer, then by order	Item collections = one-query aggregates	The high-frequency read paths
5	COUNT on the sparse GSI	Sparse index holds only the working set	The ops dashboard query
6	Ship order, remove GSI keys	Sparse keys are set/removed deliberately	The status-transition write

Cleanup (avoid lingering charges).

aws dynamodb delete-table --table-name app-main

Cost note. On-demand bills per request; this lab is a handful of requests — effectively free, and well within the DynamoDB free tier (25 GB storage, 25 provisioned WCU/RCU if you used provisioned instead). Deleting the table stops all storage charges.

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First as a scannable table you can read mid-incident, then the same entries with the full confirm-command detail underneath.

#	Symptom	Root cause	Confirm (exact cmd / console path)	Fix
1	`ThrottledRequests` while table sits at ~40% of capacity	Hot partition — one PK over the per-partition ~1,000 WCU / ~3,000 RCU ceiling	Contributor Insights top key; `ConsumedWriteCapacityUnits` below provisioned	Write-shard the PK (`#0..#9`); fan-out reads
2	`ValidationException: Item size has exceeded the maximum allowed size`	An attribute (array) grew the item past 400 KB	Log item bytes before the write; inspect the offending item	Model each increment as its own item (adjacency list)
3	A query consumes far more RCU than rows returned	It is a `Scan` + `FilterExpression`, not a `Query`	`--return-consumed-capacity TOTAL` vs row count; confirm it’s a `Scan`	Add an overloaded GSI / redesign keys so `KCE` selects
4	Base-table writes throttle even though base capacity is healthy	An under-provisioned GSI throttles back onto the base table	`WriteThrottleEvents` on the index; per-index `ConsumedWCU`	Provision GSI WCU ≥ base write rate; narrow projection
5	New GSI returns partial/empty results for old data	Backfill gap — index only has items that already carry its keys	Index `Backfilling=true` / `OnlineIndexPercentageProgress` < 100	Wait for `ACTIVE`; run an idempotent throttled backfill
6	`ConditionalCheckFailedException` on every retry of an update	Optimistic-concurrency version moved under you	Compare the item’s `version` to the one you sent	Re-read, re-apply on the new version, retry
7	`TransactionCanceledException` under load	A condition failed or two transactions hit the same item	Read `CancellationReasons[]` per item	Narrow the transaction; add jittered retry; reduce contention
8	Sparse-GSI “queue” keeps growing, never drains	The sparse key isn’t removed on the state transition	`--select COUNT` keeps rising; inspect a “done” item still has `GSIxPK`	`REMOVE GSIxPK, GSIxSK` in the transition `UpdateItem`
9	Reads sometimes miss a just-written item	Read a GSI (eventually consistent) expecting strong consistency	The item exists on the base table but not yet in the GSI	`GetItem` the base item, or accept the propagation delay
10	`ProvisionedThroughputExceededException` bursts at peak	Provisioned + auto scaling can’t react fast enough to a spike	Throttling correlates with a sharp ramp; instance count flat	Switch to on-demand for spiky traffic, or pre-scale
11	Hot key after migrating to a new key — backfill itself throttles	A full-speed `Scan`-and-update backfill melts capacity	Throttling spikes only during the backfill job	Rate-limit the job; export-to-S3 + transform off the live table
12	`BETWEEN`/range query returns nothing or wrong rows	SK components ordered fine→coarse, or numbers stored as strings	Inspect the SK structure of returned vs expected items	Reorder SK coarse→fine; zero-pad or use ISO-8601

Before the expanded reasoning, the exception/error reference you scan first — the exact strings DynamoDB throws, what each means for a single-table model, and whether it is the client’s fault (retryable in-place) or a design fault:

Exception / error string	What it means	Retryable?	Likely single-table cause	First fix
`ProvisionedThroughputExceededException`	Request exceeded provisioned (or burst) capacity	Yes (SDK backs off)	Hot key, or under-provisioned	Shard the key; raise capacity / on-demand
`ThrottlingException`	Control-plane / on-demand throttling	Yes	Sharp spike past warmed capacity	Pre-warm; on-demand; jittered retry
`ConditionalCheckFailedException`	A condition expression evaluated false	No (re-read first)	Optimistic-concurrency version moved	Re-read, re-apply, retry
`TransactionCanceledException`	A transaction was canceled	Sometimes	A per-item condition failed / item conflict	Read `CancellationReasons[]`; narrow txn
`TransactionConflictException`	Another txn touched the same item	Yes (jitter)	Contention on a hot item	Shard contended item; backoff
`ValidationException` (item size)	“Item size has exceeded the maximum allowed size”	No	An attribute grew past 400 KB	Model increments as separate items
`ValidationException` (key)	Key/attribute type or missing key	No	Wrong type (`S` vs `N`), absent key attr	Fix the item shape / key definition
`ItemCollectionSizeLimitExceededException`	An LSI item collection passed 10 GB	No	Too much data under one PK with LSIs	Re-model; prefer GSIs over LSIs
`ResourceInUseException`	Table/index busy (e.g. another GSI op)	Yes (wait)	Two GSI creates/deletes at once	Serialize; one index op in flight
`LimitExceededException`	An account/table limit hit (e.g. 20 GSIs)	No	Too many GSIs / concurrent ops	Overload indexes; request a limit raise
`ProvisionedThroughputExceeded` on a GSI	A GSI hit its own throughput	Yes	Under-provisioned GSI throttling base	Size GSI WCU to base write rate

The expanded form, with the full reasoning for the entries that bite hardest:

1. ThrottledRequests while the table sits well below provisioned. Root cause: a hot partition — one partition key over the per-partition ~1,000 WCU / ~3,000 RCU ceiling. Adaptive capacity smooths skew but cannot exceed the per-partition limit. Confirm: CloudWatch Contributor Insights for DynamoDB surfaces the most-throttled partition key; ConsumedWriteCapacityUnits sits far below provisioned while ThrottledRequests > 0. Fix: write-shard the key (PK#0..#9 with a calculated suffix sized to required WCU / 1,000) so writes fan across physical partitions; reads scatter across the N shards and merge client-side.

2. ValidationException: Item size has exceeded the maximum allowed size. Root cause: an attribute — usually an append-style array (events, line items) — grew the whole item past the absolute 400 KB limit. Confirm: log the serialized item size immediately before the write; the offending item is near or over 400 KB. Fix: stop appending to one item; model each increment as its own item under an adjacency-list SK (MILESTONE#/ITEM#), and keep a small fixed-size “latest” summary item for cheap dashboard reads.

3. A query consumes far more RCU than the rows it returns. Root cause: the operation is a Scan + FilterExpression, which reads every item then discards most — capacity scales with table size, not result size. Confirm: --return-consumed-capacity TOTAL shows RCU vastly larger than the row count; the call is a Scan, not a Query. Fix: design keys or add an overloaded GSI so a KeyConditionExpression does the selection; every steady-state read must be one Query/GetItem.

4. Base-table writes throttle though base capacity looks healthy. Root cause: an under-provisioned GSI — under provisioned mode, a throttled index backpressures and throttles the base-table writes that touch its key attributes. Confirm: WriteThrottleEvents on the index dimension is non-zero while the base table’s consumed WCU is below provisioned. Fix: raise the GSI’s provisioned WCU to at least the base write rate that touches its keys (or share on-demand), and narrow the projection so fewer writes replicate.

5. A new GSI returns partial or empty results for historical data. Root cause: the backfill gap — a new GSI only contains items that already carry its key attributes; existing rows stay invisible until backfilled. Confirm: the index reports Backfilling=true / OnlineIndexPercentageProgress < 100; --select COUNT on old vs new index disagrees. Fix: do not query until ACTIVE; run an idempotent, throttled backfill (parallel Scan + UpdateItem with attribute_not_exists(GSIxPK)), or export-to-S3 + transform + BatchWriteItem.

6. ConditionalCheckFailedException on every retry of an update. Root cause: optimistic-concurrency conflict — another writer bumped the version attribute, so your version = :curv condition fails. Confirm: the item’s current version differs from the one you sent. Fix: re-read the item, re-apply your change on the new version, and retry; consider jittered backoff if conflicts are frequent.

7. TransactionCanceledException under load. Root cause: a transactional write failed because a per-item condition failed or two transactions collided on the same item. Confirm: read CancellationReasons[] in the error — each item reports ConditionalCheckFailed, TransactionConflict, or None. Fix: narrow the transaction to the items that truly need the invariant, add jittered retry, and reduce contention (shard the contended item or use optimistic concurrency for single-item updates).

8. The sparse-GSI “queue” grows forever and never drains. Root cause: the sparse key attribute is not removed on the state transition, so “done” items linger in the index. Confirm: --select COUNT on the index keeps rising; a completed item still carries GSIxPK. Fix: add REMOVE GSIxPK, GSIxSK to the transition UpdateItem (as in A7), so the item leaves the working set the moment it changes state.

Best practices

Write the access-pattern list first. Every read and write with its filter, sort, and cardinality, reviewed by the team, before any attribute is named. The schema is derived from it.
No Scan in the steady state. Every production access pattern must be one Query or GetItem. A Scan + FilterExpression is a modeling bug, not a tuning knob.
Overload keys and indexes. Generic PK/SK and GSIxPK/GSIxSK with type prefixes so many entities share the table and one index serves many patterns — and you stay inside the 20-GSI cap.
Order sort-key components coarse-to-fine and use ISO-8601 timestamps so lexicographic ordering gives you BETWEEN ranges and newest-first for free.
Make working-set GSIs sparse, and remove the sparse key on the state transition so the index holds exactly the live set (open orders, in-flight jobs) and the ops query touches a fraction of the data.
Project narrowly. KEYS_ONLY or INCLUDE by default; reserve ALL for indexes whose queries genuinely need the whole item. You cannot shrink a projection in place.
Materialize many-to-many in both directions via a GSI key-flip, not a runtime second read; reconcile any denormalized copies with Streams.
Write-shard high-write keys (time-series, mega-aggregates) with a shard count sized to required WCU / 1,000, and accept the read fan-out as the deliberate trade.
Match capacity mode to traffic shape: on-demand for spiky/unknown; provisioned + auto scaling (+ a reserved floor) for steady, predictable load.
Enforce invariants at write time: attribute_not_exists for insert-only, a version attribute for optimistic concurrency, TransactWriteItems for multi-item atomicity — and remember transactions cost 2× WCU.
Never grow an item toward 400 KB. Append-style data is separate items, not an unbounded attribute.
Make backfills online, idempotent, throttled, and reconciled with a COUNT check; for huge tables run them off the live capacity via export-to-S3.
Enable Contributor Insights and alarm on the leading indicators — per-partition throttling, not just table-level capacity.

The alerts worth wiring before the next peak — the leading indicators, not the lagging “table throttling”:

Alert on	Metric	Threshold (starting point)	Why it’s leading
Hot key	Contributor Insights most-throttled key	Any sustained single-key dominance	Names the key before broad throttling
Write throttling	`WriteThrottleEvents` (table + each GSI)	> 0 sustained 5 min	Catches a GSI bottleneck early
Read throttling	`ReadThrottleEvents`	> 0 sustained 5 min	Hot read key or under-provisioned read
Consumed vs provisioned	`ConsumedWCU` / provisioned	< 50% while throttling	The hot-partition signature
System errors	`SystemErrors` (5xx)	> 0	Distinguishes platform from your throttling
Conditional failures	`ConditionalCheckFailedRequests`	Rising trend	Concurrency contention building

Security notes

Encrypt with a customer-managed KMS key where compliance requires control over rotation and access. DynamoDB encrypts at rest by default with an AWS-owned key; switch to an AWS KMS CMK for auditable key policies and the ability to revoke access by disabling the key.
Scope IAM to items and attributes, not the whole table. Use dynamodb:LeadingKeys condition keys to restrict a principal to its own partition (multi-tenant isolation), and dynamodb:Attributes to limit which attributes a role can read or write. A tenant’s role should never be able to Query another tenant’s PK.
Prefer fine-grained access over a broad dynamodb:*. Separate read roles (GetItem, Query) from write roles (PutItem, UpdateItem), and gate Scan/DeleteTable/UpdateTable behind admin-only policies.
Reach DynamoDB over a VPC endpoint (Gateway or PrivateLink) so traffic never traverses the public internet, and attach an endpoint policy that further constrains which tables and actions are reachable from the VPC.
Turn on point-in-time recovery (PITR) for any table holding business data — it gives continuous backups and second-level restore, and the export-to-S3 path you use for migrations depends on it.
Audit with CloudTrail. DynamoDB control-plane calls are logged; enable data-plane logging selectively for sensitive tables to capture item-level access.
Keep secrets out of items. Item attributes are not a secret store; reference Secrets Manager / Parameter Store for credentials, and never project a sensitive attribute into a GSI you query broadly.

The security controls that also prevent operational incidents — secure and resilient pull the same direction:

Control	Mechanism	Secures against	Also prevents
KMS CMK encryption	`SSESpecification` + key policy	Plaintext-at-rest exposure	Unauthorized restore from snapshots
Tenant isolation	`dynamodb:LeadingKeys` condition	Cross-tenant reads	A tenant hot-keying another’s partition
Attribute-level scope	`dynamodb:Attributes` condition	Over-reading sensitive fields	Accidental wide projections
VPC endpoint + policy	Gateway/PrivateLink endpoint	Public-internet exposure	Data exfiltration paths
PITR	Continuous backups	Data loss / bad deploy	Migration export depends on it
Least-privilege IAM	Split read/write/admin roles	Broad `dynamodb:*` blast radius	A bad job running `Scan`/`DeleteTable`

Cost & sizing

The bill drivers and how they interact with the design:

Capacity is the dominant line. On-demand bills per request unit (write/read request units); provisioned bills per provisioned WCU/RCU-hour whether used or not. For steady load, provisioned + reserved is materially cheaper per request; for spiky or unknown load, on-demand avoids both over-provisioning and throttle-and-retry waste.
GSIs multiply write cost. Every write that touches a projected attribute costs extra WCU to replicate into each GSI. An ALL projection on a wide, hot item can quietly double or triple write spend — this is why you project narrowly.
Transactions cost 2× WCU. A TransactWriteItems of two items costs as if you wrote four. Use transactions for invariants, not as a default wrapper.
Storage is cheap but real. You pay per GB-month for the table plus each GSI’s projected copy. Sparse indexes and narrow projections keep this down.
Streams and PITR add modest charges — Streams per read request unit on the stream, PITR per GB-month — both small relative to capacity, and both worth it.

A rough monthly picture for a small-to-mid production table (~25 GB, ~5M writes/day, ~20M reads/day, two GSIs with INCLUDE):

Cost driver	What you pay for	Rough INR / month	What drives it up	Lever to pull
On-demand writes	Write request units	~₹3,000–6,000	Write volume × GSI count	Narrow projections; fewer GSIs
On-demand reads	Read request units (eventual = ½)	~₹1,500–3,000	Read volume; strong-consistent reads	Use eventual reads where safe
Provisioned (alt.)	WCU/RCU-hours + reserved	~₹2,000–4,000 steady	Over-provisioning headroom	Auto scaling + reserved floor
GSI storage	Per-GB projected copies	~₹500–1,500	`ALL` projections; many GSIs	`KEYS_ONLY`/`INCLUDE`; sparse
Streams	Stream read request units	~₹300–800	Change rate × consumers	Filter at the consumer
PITR + backups	Per-GB-month continuous	~₹400–1,000	Table size	Keep, it’s cheap insurance

Free-tier reality: DynamoDB’s perpetual free tier covers 25 GB of storage and 25 provisioned WCU + 25 RCU (enough for ~200M requests/month) on provisioned mode — a real production-grade allowance for small workloads. On-demand has no perpetual free allowance but is pay-per-use, so a low-traffic table costs pennies. The cheapest correct design is almost always “fewer, narrower GSIs + the right capacity mode,” not a bigger anything — the same lesson as the hot-partition fix: model it right and the bill follows.

Interview & exam questions

1. Why design DynamoDB schemas “backward” from access patterns instead of from entities? Because DynamoDB has no server-side join and no query planner — the only way to relate items cheaply is to co-locate them at write time. If you model entities first, you discover the queries your application needs require joins DynamoDB can’t do, and you refactor. Enumerating access patterns first lets you design keys that make each query a single-partition read.

2. What is entity overloading and why does single-table design depend on it? Naming the primary-key attributes generically (PK, SK) and encoding the entity type into the value with a prefix (CUST#, ORDER#, ITEM#), so multiple entity types share one table and one partition can hold a parent plus its children (an item collection). It’s the mechanism that lets a single Query return an aggregate, which is the whole point of single-table design.

3. Explain a sparse GSI and a real use for one. An item appears in a GSI only if it has both of that index’s key attributes — so if you write the GSI key only while an item is in a particular state and remove it on transition, the index holds exactly that working set. The canonical use is an “open orders” queue: index only OPEN orders, remove the key when they ship, and the ops dashboard queries a fraction of the data.

4. What are the per-partition throughput limits and why do they cause throttling at low table utilization? A single physical partition sustains roughly 1,000 WCU and 3,000 RCU. A hot key concentrates traffic on one partition and hits that ceiling even though the table’s total provisioned capacity is barely touched — so you see ThrottledRequests while ConsumedWriteCapacityUnits is at 40% of provisioned. The fix is write-sharding, not more capacity.

5. How does write-sharding work and what does it trade? You append a suffix (#0..#9) to the partition key so writes fan across multiple physical partitions; sized to required WCU / 1,000. A calculated suffix (hash of a key attribute) lets a point read recompute the exact shard; a random suffix maximizes spread but forces reads to scatter across all N shards and merge client-side. The trade is read fan-out for write throughput.

6. When do you choose on-demand vs provisioned capacity? By traffic shape. On-demand for new, spiky, or unpredictable workloads — it scales instantly and needs no planning. Provisioned + auto scaling (with a reserved floor for the baseline) for steady, predictable, diurnal load — it’s materially cheaper per request. You can only switch modes once per 24 hours, so it’s not a runtime knob.

7. What’s the difference between a FilterExpression and a KeyConditionExpression in cost terms? A KeyConditionExpression selects items by key before reading them, so you pay only for what you select. A FilterExpression runs after the key query and reduces the returned payload but not the capacity consumed or items examined. A pattern served only by Scan + FilterExpression reads the whole table and pays for it.

8. How do you model a many-to-many relationship in a single table? Materialize the edge as an item and store both directions by writing generic GSI keys that invert PK and SK. For users-in-groups, the membership item has PK=USER#u1, SK=GROUP#g1 and GSI1PK=GROUP#g1, GSI1SK=USER#u1: the base table answers “groups for a user,” GSI1 answers “users in a group” — one item, two query directions, no second write to keep in sync.

9. What does TransactWriteItems guarantee and what does it cost? All-or-nothing across up to 100 items (and multiple tables), each with its own condition; if any condition fails or it collides with another transaction on the same item, the whole thing is canceled (TransactionCanceledException with per-item reasons). It costs 2× the WCU of the equivalent non-transactional writes (prepare + commit). Use it for genuine multi-item invariants, not as a default.

10. How do you add a GSI to a live, high-traffic table without downtime or stale data? UpdateTable to create the GSI — it’s an online operation that backfills in the background while the table stays available; don’t query the index until it’s ACTIVE (watch OnlineIndexPercentageProgress). A new GSI only contains items that already carry its key attributes, so you run an idempotent, throttled backfill (parallel Scan + conditional UpdateItem), or export to S3, transform, and BatchWriteItem back to keep it off the live capacity.

11. What is the 400 KB limit and how do you design around it? It’s the absolute maximum size of a single item, including all attribute names and values. The classic violation is appending to an unbounded array (events, line items) until the rollup item crosses it and writes fail with ValidationException. Design around it by modeling each increment as its own item under an adjacency-list sort key, and keeping a small fixed-size summary item for cheap reads.

12. Why can an under-provisioned GSI throttle your base table? Under provisioned capacity mode, a GSI has its own throughput; if a write touches a projected attribute and the GSI can’t absorb the replicated write, that backpressure throttles the base-table write too. So an under-provisioned index becomes a write bottleneck for the whole table — size each GSI’s WCU to the base write rate that touches its keys, and project narrowly to reduce replication.

These map to the AWS Certified Developer – Associate (DVA-C02) — develop solutions using DynamoDB, data modeling, GSIs, capacity, transactions — and the AWS Certified Solutions Architect – Associate (SAA-C03) and – Professional (SAP-C02) for the cost/capacity/scaling design trade-offs. The hot-partition and throughput mechanics also surface in the Data Engineer – Associate (DEA-C01). A compact cert-mapping for revision:

Question theme	Primary cert	Exam objective area
Access patterns, key/GSI design	DVA-C02	Develop with DynamoDB; data modeling
Sparse indexes, projections	DVA-C02	Optimize DynamoDB access
Hot partitions, write-sharding	DEA-C01 / SAP-C02	Design for performance at scale
On-demand vs provisioned, auto scaling	SAA-C03	Design cost-optimized, resilient storage
Transactions, condition expressions	DVA-C02	Implement data consistency
Online GSI add, backfill, Streams	DEA-C01	Operationalize data pipelines

Quick check

A Query returns 12 items but --return-consumed-capacity TOTAL reports an RCU far larger than 12 rows would imply. What is almost certainly happening, and what’s the fix?
You see ThrottledRequests > 0 while ConsumedWriteCapacityUnits sits at 40% of provisioned. Name the root cause and the one design change that fixes it.
True or false: a Global Secondary Index can be read with strong consistency.
An “open jobs” sparse GSI keeps growing and never shrinks even as jobs complete. What did the code forget to do?
Your rollup item’s writes start failing with ValidationException: Item size has exceeded the maximum allowed size. What’s the cause and the re-modeling fix?

Answers

The operation is really a Scan + FilterExpression (or a query reading far more than it returns), so you pay capacity for items read and then discarded. The fix is to make the keys do the selection — design the PK/SK or add an overloaded GSI so a KeyConditionExpression selects, turning it into a true single-partition Query.
A hot partition — one partition key is over the per-partition ~1,000 WCU / ~3,000 RCU ceiling while the table’s total is barely used; adaptive capacity can’t exceed the per-partition limit. The fix is to write-shard that key (suffix #0..#9, sized to required WCU / 1,000) so writes fan across physical partitions.
False. GSIs are eventually consistent only; there is no strongly-consistent GSI read. If you need strong consistency, GetItem the base-table item (or use an LSI, which supports strong reads on the same partition).
It forgot to remove the sparse key attributes (REMOVE GSIxPK, GSIxSK) on the state transition. A sparse index keeps any item that still has both key attributes, so “done” items linger until you strip the keys.
An attribute — typically an append-style array — has grown the whole item past the absolute 400 KB limit. Re-model by storing each increment as its own item under an adjacency-list sort key (ITEM#/MILESTONE#), and keep a small fixed-size “latest” summary item for the dashboard read.

Glossary

Single-table design — the practice of storing many entity types in one DynamoDB table so related items co-locate and queries become single-partition reads.
Access pattern — one concrete read or write the application needs, with its filter, sort order, and cardinality; the artifact the schema is derived from.
Partition key (PK) — the key DynamoDB hashes to choose a physical partition; decides co-location and hot-key risk.
Sort key (SK) — orders items within a partition and enables range queries (begins_with, BETWEEN) and adjacency lists.
Entity overloading — naming keys generically (PK/SK) and encoding the entity type as a value prefix so multiple entities share the table.
Item collection — all items sharing one partition key; a single Query returns the whole collection (parent + children).
Adjacency list — the one-to-many pattern where children share the parent’s partition (ORDER# owns its ITEM# rows).
Global Secondary Index (GSI) — an alternate (PK, SK) over the same items, maintained asynchronously (eventually consistent), with its own throughput and projection.
Index overloading — generic GSIxPK/GSIxSK attributes each entity populates differently, so one index serves many logical patterns.
Sparse index — a GSI that contains an item only if the item has both of the index’s key attributes; used to index a working set.
Projection — which attributes a GSI copies: KEYS_ONLY, INCLUDE (a named list), or ALL; drives index storage and write cost.
Hot partition / hot key — a partition key taking disproportionate traffic, hitting the per-partition ~1,000 WCU / ~3,000 RCU ceiling and throttling.
Adaptive capacity — DynamoDB automatically shifting throughput toward busy partitions and isolating hot items; cannot exceed per-partition limits.
Write sharding — appending a suffix to a partition key (calculated or random) to fan a high-write key across multiple physical partitions.
On-demand / provisioned — the two capacity modes: pay-per-request (instant scale) versus reserved RCU/WCU (cheaper at steady load, paired with auto scaling).
Condition expression — a predicate that makes a write conditional and rejects it atomically (attribute_not_exists, version = :v) otherwise.
Optimistic concurrency — using a version attribute and a condition so a lost update is rejected (ConditionalCheckFailedException) rather than silently overwritten.
TransactWriteItems — an all-or-nothing write across up to 100 items (and multiple tables), each with its own condition; costs 2× WCU.
400 KB item limit — the absolute maximum size of one item; the reason append-style data must be modeled as separate items.
DynamoDB Streams — an ordered change log of item-level modifications, consumed by Lambda for denormalization, reconciliation, and backfills.

Next steps

You can now take an access-pattern list, derive an overloaded key schema and GSIs that serve every read in one call, and defend it against hot partitions and the 400 KB limit. Build outward:

Next: DynamoDB Deep Dive: Tables, Keys, Capacity, GSIs & Streams — the full service surface behind the modeling craft in this article.
Related: DynamoDB Streams: Change Data Capture & Event-Driven Pipelines — the reconciliation and backfill machinery for evolving a single-table schema.
Related: EventBridge: Event-Driven Architecture with Buses, Schema Registry & Pipes — fan single-table changes out to downstream consumers cleanly.
Related: Step Functions: Distributed Orchestration & Error-Handling Patterns — orchestrate multi-item, multi-table transactions and backfills with retries.
Related: Cosmos DB Partition Key Design & RU Optimization — the same partition-design discipline on Azure’s NoSQL store.