Cosmos DB Multi-Region Writes: Consistency Levels and Conflict Resolution

Multi-region writes are the feature that makes Azure Cosmos DB look like magic in a demo and like a distributed-systems trap in production. Azure Cosmos DB is Microsoft’s globally distributed, multi-model database with single-digit-millisecond reads and a turnkey 99.999% SLA; multi-region writes (formerly “multi-master”) let every region you add accept writes for the same logical data, instead of one primary write region and a fleet of read replicas. The moment two regions can both accept writes for the same logical partition, you have surrendered the comfortable single-writer world and signed up for conflict resolution, weaker consistency, and a much harder mental model. None of that is a reason to avoid it: for globally distributed, write-heavy, low-latency workloads it is the right tool. But you have to configure it deliberately.

This guide walks the full path: enabling multi-region writes, picking a consistency level you can actually defend in an SLA review, and building both last-writer-wins (LWW) and custom conflict resolution that behaves correctly when a region drops. Because this is a reference you will return to at 02:00 during a regional incident — or three weeks later when reconciliation flags a ledger that disagrees with the payment processor — the option matrices, the consistency comparison, the conflict-type reference, the limits and the symptom→cause→confirm→fix playbook are all laid out as scannable tables. Read the prose once, then keep the tables open.

Everything here assumes the Cosmos DB for NoSQL API. The consistency model is API-agnostic, but conflict-resolution policies and the conflicts feed are specific to the NoSQL API; Cassandra, MongoDB and Gremlin handle conflicts differently (typically LWW only, with no pluggable resolver). By the end you will stop guessing: you will know which consistency level buys you which RPO, why Strong is off the table the instant you enable multi-write, why default LWW on _ts quietly loses money in an ordered domain, and exactly which az command confirms each of those facts.

What problem this solves

A single-write-region Cosmos DB account is simple to reason about: one region orders every write, the rest catch up, and a failover promotes a replica. That simplicity costs you write latency for users far from the primary. An order placed in Singapore against a primary in East US 2 pays a cross-Pacific round trip on every write — 180–250 ms when the local read was 5 ms. For a write-heavy, latency-sensitive workload (carts, sessions, telemetry ingest, IoT device state, collaborative editing) that is the difference between a snappy app and a sluggish one, and no amount of read-replica scaling fixes it because the write still crosses the ocean.

Multi-region writes fix the latency by letting the nearest region accept the write and acknowledge locally. What breaks without understanding the trade is correctness. Teams flip enableMultipleWriteLocations because a blog said it improves availability, leave the container on its default LWW-on-_ts policy, and ship. Months later a partial network partition lets two regions edit the same document in the same second; _ts ties at one-second granularity; Cosmos deterministically (but arbitrarily) keeps one and silently discards the other. The loser never appears in the conflicts feed. In a stateful, ordered domain — a payment that went authorized in one region and captured in another — that is real money moving with the ledger disagreeing, found only by an out-of-band reconciliation days later.

Who hits this: anyone running a globally distributed, write-active workload on Cosmos DB. It bites hardest on ordered state machines (payments, inventory, bookings) where LWW-on-_ts is almost never correct; on teams who chose Strong consistency for safety and then can’t enable multi-write at all; and on anyone who set Custom (no sproc) resolution and never built a drainer for the conflicts feed, so divergence accumulates invisibly. The fix is never “turn multi-write off” — it’s “make the consistency level and the conflict-resolution policy deliberate parts of your data model, and prove they behave under a region loss.” Here is the whole field in one frame before the deep dive:

Decision	The trap	What “right” looks like	Where it’s set
Multi-write on/off	“More regions = always better” — 3× write RU cost	On only where you need write locality; read replicas elsewhere	Account (`enableMultipleWriteLocations`)
Consistency level	Picking Strong, then can’t enable multi-write	Bounded Staleness / Session for most; relax per-request	Account default + per-request override
Conflict policy	Default LWW on `_ts` in an ordered domain	LWW on a monotonic `/version`, or a deterministic sproc	Container (set at creation, immutable)
Conflicts feed owner	Custom-no-sproc with nobody draining it	A continuous drainer + a depth alert	Application + Azure Monitor
Failover behaviour	Assuming a promotion step on the write path	RTO≈0 for writes; client `PreferredRegions` retries locally	Account + `CosmosClientOptions`
RPO awareness	“Multi-region = no data loss”	RPO is non-zero except at Strong (unavailable here)	Consistency level governs it

Learning objectives

By the end of this article you can:

Enable multi-region writes safely on an existing account, in the right order, with az and Bicep — and explain why it roughly multiplies provisioned write RU/s by the number of write regions.
Place every workload on the correct point of the five-level consistency spectrum, justify the choice in latency/availability/RPO terms, and relax (never tighten) consistency per request.
State precisely why Strong consistency is incompatible with multi-region writes, and what Bounded Staleness gives you instead (including its enforced minimums).
Identify the three conflict types (insert, replace, delete) and predict how each surfaces under LWW, Custom-sproc, and Custom-manual policies.
Configure LWW on a custom numeric path correctly — keeping the path present, numeric and monotonic — and explain how client-clock skew turns into silent data loss.
Write a deterministic, idempotent resolver stored procedure, bind it to a container, and drain the conflicts feed in application code when manual resolution is required.
Rehearse a regional outage with a controlled failover (or endpoint block), configure ApplicationPreferredRegions, and quote the RPO/RTO for each consistency level.
Monitor replication latency and conflict activity in Azure Monitor so silent divergence can never hide.

Prerequisites & where this fits

You should already understand the Cosmos DB basics: an account is the top-level resource that owns regions and the consistency policy; a database is a namespace; a container holds items and owns the partition key, indexing policy, throughput, and the conflict-resolution policy. You should know that throughput is measured in Request Units per second (RU/s) (provisioned or autoscale), that every item lives in a logical partition keyed by your partition-key path, and that the .NET/Java/JS SDKs talk to Cosmos in Direct or Gateway mode. Comfort with az cosmosdb, reading JSON output in Cloud Shell, and basic distributed-systems vocabulary (quorum, linearizability, RPO/RTO) will make this land faster.

This sits in the Data & Global Distribution track. It assumes the modeling fundamentals from Cosmos DB Partition Key Design & RU Optimization — a bad partition key amplifies every problem here, because hot partitions and cross-partition fan-out get worse, not better, with more write regions. It is the database-layer companion to Azure Multi-Region Active-Active Architecture and pairs with global front-ends from Azure Front Door & Traffic Manager: Global Failover. The RPO/RTO framing comes from High Availability vs Disaster Recovery: RTO & RPO, and the consistency theory generalizes in Multi-Region Data Replication & Consistency Strategies. A quick map of who owns which decision during a design or an incident:

Layer	What lives here	Who usually owns it	What it can cause
Account regions & failover	`locations`, `failoverPriority`, automatic failover	Platform / SRE	Wrong write topology; surprise RU cost
Consistency policy	Default level + min staleness window	Architect + app lead	Too-weak RPO, or Strong blocking multi-write
Container conflict policy	LWW path / sproc / manual feed	App / data team	Silent data loss; divergence
Conflicts feed	Drainer job + depth alert	App + ops	Accumulating, invisible divergence
Client SDK	`PreferredRegions`, session token, consistency override	App / dev team	No local failover; lost read-your-writes
Observability	`ReplicationLatency`, conflict metrics, alerts	Ops / SRE	Blind to lag and conflicts

Core concepts

Five mental models make every later decision obvious.

Multi-region writes means every region is a write region. Once you flip enableMultipleWriteLocations, failoverPriority no longer decides who can write (everyone can) — it only orders how regions are reprioritized during automatic failover. A write lands in the region nearest the client, commits and acknowledges locally, and replicates asynchronously to the others. That local ACK is the whole point — and the source of every conflict, because two regions can both ACK a write to the same document before they have heard from each other.

Consistency is a tunable, linear spectrum, and multi-write removes the strongest option. Cosmos exposes five levels from strongest to weakest. Stronger means reads see more recent, more ordered data at higher latency and lower availability; weaker means lower latency and higher availability at the cost of recency and ordering. The hard rule: Strong is incompatible with multi-region writes because linearizability requires one global order of writes, which independent write regions cannot provide. So a multi-write account chooses among Bounded Staleness, Session, Consistent Prefix, Eventual.

A conflict is two live versions of one document that meet during replication. With multiple writers, two clients can mutate the same id + partition key concurrently in different regions. When async replication brings those versions together, Cosmos detects a conflict. It does not panic and it does not block the write path — the regions already ACK’d locally. What happens next is governed entirely by the container’s conflict-resolution policy, chosen at container creation and effectively immutable.

The resolution policy is part of your data model, not an afterthought. Three policies exist. LWW auto-resolves on a numeric path (default _ts) — highest value wins, losers vanish silently. Custom (stored procedure) runs your JavaScript resolver on every conflict so you can merge or apply business rules. Custom (manual) writes every conflicting version to a per-container conflicts feed and stops, leaving your app to drain and reconcile. Choosing the wrong one for your domain — LWW-on-_ts for an ordered state machine — is a correctness bug, not a tuning miss.

RTO is near-zero; RPO is non-zero. Because every region already writes, losing a region does not require a promotion step on the write path — the SDK simply stops routing there, so RTO for writes is effectively zero. But whatever had not yet replicated when the region died is lost: that is your RPO, and it is non-zero for every multi-write consistency level. Bounded Staleness caps it to your configured window; Session/Consistent Prefix/Eventual leave it unbounded in the worst case. You buy RTO with multi-region writes and pay for it in RPO — internalize that sentence.

The vocabulary in one table

Pin down every moving part before the deep sections; the glossary repeats these for lookup, this is the mental model side by side:

Term	One-line definition	Where it lives	Why it matters here
Multi-region writes	Every region accepts writes for the same data	Account toggle	Enables conflicts; ~N× write RU
Write region	A region that locally commits + ACKs writes	Account `locations`	Under multi-write, all of them
`failoverPriority`	Order regions are reprioritized on failover	Per location	Only orders failover, not who writes
Consistency level	The read recency/ordering guarantee	Account default + per request	Governs latency and RPO
Bounded Staleness	Lag capped by K versions OR T seconds	Consistency policy	The only bounded RPO under multi-write
Session token	`x-ms-session-token` scoping read-your-writes	SDK / response header	Must be flowed across tiers
Conflict	Two live versions of one `id`+PK meeting	Replication path	The thing the policy resolves
LWW	Highest numeric path value wins, silently	Container policy	Default; wrong for ordered state
Conflict-resolution path	The numeric property LWW compares	Container policy	`_ts` by default; prefer `/version`
Resolver sproc	JS that resolves each conflict your way	Registered on container	Must be deterministic + idempotent
Conflicts feed	Where unresolved versions land	Per container	Needs an owner + a depth alert
RPO	Data lost on a region failure	Consequence of level	Non-zero except at Strong (unavailable)
RTO	Time to recover write capability	Consequence of multi-write	≈0 for writes

1. Add regions and enable multi-region writes

Multi-region writes is an account-level toggle. You first need at least two regions associated with the account, then you flip enableMultipleWriteLocations. Adding regions is an online operation; enabling multi-write is not always online and can briefly affect availability, so do it in a maintenance window the first time.

With Azure CLI, add the read regions first, then enable multi-write:

# Add a second (and third) read region first
az cosmosdb update \
  --name kv-cosmos-prod \
  --resource-group rg-data-prod \
  --locations regionName="East US 2" failoverPriority=0 isZoneRedundant=true \
  --locations regionName="West Europe" failoverPriority=1 isZoneRedundant=true \
  --locations regionName="Southeast Asia" failoverPriority=2 isZoneRedundant=true

# Then enable multi-region writes
az cosmosdb update \
  --name kv-cosmos-prod \
  --resource-group rg-data-prod \
  --enable-multiple-write-locations true

A few things that bite people:

failoverPriority=0 is the write region under single-write, and the target of automatic failover. Priorities must be contiguous starting at 0 and unique.
Once multi-region writes are on, every region is a write region; failoverPriority then only governs the order regions are reprioritized during automatic failover, not who can write.
Zone redundancy (isZoneRedundant) is per region and can only be set when the region is added. You cannot toggle it in place later without removing and re-adding the region — and removing the last replica of data is not something you do casually.

Declaratively in Bicep, which is how this should live in your repo:

resource account 'Microsoft.DocumentDB/databaseAccounts@2024-11-15' = {
  name: 'kv-cosmos-prod'
  location: 'East US 2'
  kind: 'GlobalDocumentDB'
  properties: {
    databaseAccountOfferType: 'Standard'
    enableMultipleWriteLocations: true
    enableAutomaticFailover: true
    consistencyPolicy: {
      defaultConsistencyLevel: 'BoundedStaleness'
      maxStalenessPrefix: 100000
      maxIntervalInSeconds: 300
    }
    locations: [
      { locationName: 'East US 2',     failoverPriority: 0, isZoneRedundant: true }
      { locationName: 'West Europe',    failoverPriority: 1, isZoneRedundant: true }
      { locationName: 'Southeast Asia', failoverPriority: 2, isZoneRedundant: true }
    ]
  }
}

Each account-level knob, what it does, and the gotcha — read your row before you toggle anything:

Setting	Values	Default	When to change	Trade-off / gotcha
`enableMultipleWriteLocations`	true / false	false	You need write locality in >1 region	~N× write RU; introduces conflicts; not always online to enable
`enableAutomaticFailover`	true / false	false	Always, in prod	Harmless under multi-write; essential under single-write
`defaultConsistencyLevel`	Strong / BoundedStaleness / Session / ConsistentPrefix / Eventual	Session	Match your RPO/latency budget	Strong forbidden with multi-write
`maxStalenessPrefix`	≥100000 (multi-region)	—	Bounded Staleness only	Below the floor is rejected on a multi-region account
`maxIntervalInSeconds`	≥300 (multi-region)	—	Bounded Staleness only	Tighter (smaller) window costs latency/availability
`locations[].failoverPriority`	0…N-1, contiguous, unique	—	Reorder failover preference	Under multi-write, ordering only — not who writes
`locations[].isZoneRedundant`	true / false	false	Want AZ resilience in-region	Set at add-time only; not toggleable in place
`locations[].locationName`	any Azure region	—	Add/remove a region	Removing the last region of data deletes that copy

The flags that look similar but mean very different things — the distinctions that waste the most time:

Distinction	The trap	How to tell them apart
Add region vs enable multi-write	“I added a region, so it can write” — no, it’s a read replica until you flip the toggle	`writeLocations` lists every region only after `enableMultipleWriteLocations: true`
`failoverPriority` under single vs multi write	Assuming priority gates writes under multi-write	Under multi-write, priority only orders automatic failover; all regions write
Automatic failover vs multi-region writes	Thinking automatic failover gives you active-active writes	Automatic failover promotes a single write region; multi-write makes them all write
Zone redundant vs multi-region	Conflating in-region AZ HA with cross-region	`isZoneRedundant` is AZ-level inside one region; regions are the geo-level

Cost note: enabling multi-region writes roughly multiplies your provisioned RU/s cost for writes by the number of write regions, because writes replicate everywhere. Three write regions is three times the write throughput cost. Decide whether you genuinely need write locality in all three or whether one or two write regions plus read replicas is enough — read replicas cost RU too, but you control them independently and they never accept a conflicting write.

The hard limits and real numbers you should know before designing the topology — these are the boundaries that turn a clean design into a 429 storm or a rejected operation:

Limit / quota	Real value	Applies to	What hitting it looks like	Note
`maxStalenessPrefix` floor (multi-region)	100,000 operations	Bounded Staleness, ≥2 regions	Operation rejected with a min-value error	Single-region floor is 10
`maxIntervalInSeconds` floor (multi-region)	300 seconds	Bounded Staleness, ≥2 regions	Operation rejected with a min-value error	Single-region floor is 5
Strong + multi-write	Not allowed	Account	`--enable-multiple-write-locations` rejected	Drop to a weaker level first
Per-physical-partition throughput	10,000 RU/s	Container	One partition 429s while container idle	Re-key, not more RU/s
Per-logical-partition storage	20 GB	Container	Writes to that PK fail at the cap	Choose a higher-cardinality PK
`_ts` granularity	1 second	LWW default path	Same-second writes tie → silent loss	Use a monotonic `/version`
Write RU multiplier	~N× (N write regions)	Account billing	Costs and 429s scale with region count	Use read replicas where write locality isn’t needed
Default item size	2 MB	Item	Write rejected above the cap	Split large docs

2. The five consistency levels and their trade-offs

Cosmos DB exposes a tunable, linear consistency spectrum. Stronger is to the left, more available and lower latency to the right:

Strong  >  Bounded Staleness  >  Session  >  Consistent Prefix  >  Eventual

The full comparison — the table you scan first when placing a workload:

Level	What it guarantees	Read latency	Write availability on partition	Multi-region writes?	RPO under region loss
Strong	Linearizable; reads see the latest committed write	Highest (cross-region quorum)	Lowest	Not allowed	0
Bounded Staleness	Lag bounded by K versions or T seconds; consistent-prefix within the bound	Higher	High	Allowed	Bounded by the staleness window
Session	Read-your-writes, monotonic reads/writes within a session token	Low	High	Allowed	Non-zero (unbounded worst case)
Consistent Prefix	Never see out-of-order writes; no recency bound	Low	High	Allowed	Non-zero (unbounded worst case)
Eventual	Replicas converge eventually; reads may be out of order	Lowest	Highest	Allowed	Non-zero (unbounded worst case)

The hard constraint, stated plainly: Strong consistency is incompatible with multi-region writes. Linearizability requires a single global ordering of writes, which you cannot have when multiple regions accept writes independently. If you try to enable multi-region writes on a Strong account, the operation is rejected. So the real choice for multi-write accounts is among Bounded Staleness, Session, Consistent Prefix and Eventual.

The default consistency level is set on the account, but a client can relax (never tighten) it per request. A Session-default account can issue an Eventual read for a cheap, fast lookup; it cannot request Strong. The relax-only rule and what each combination yields:

Account default	Per-request override allowed	Per-request override rejected	Typical use of the override
Strong (single-write only)	Bounded, Session, Prefix, Eventual	(none — already strongest)	Cheap reads on tolerant data
Bounded Staleness	Session, Consistent Prefix, Eventual	Strong	Lower-latency reads on cold paths
Session	Consistent Prefix, Eventual	Strong, Bounded	Fire-and-forget lookups
Consistent Prefix	Eventual	Strong, Bounded, Session	Telemetry / feed reads
Eventual	(none weaker)	everything stronger	n/a

// Relax to Eventual for a non-critical read (lower RU, lower latency)
var options = new ItemRequestOptions { ConsistencyLevel = ConsistencyLevel.Eventual };
var resp = await container.ReadItemAsync<Product>(
    id, new PartitionKey(tenantId), options);

The concrete read anomalies each level does and does not permit — this is the table that turns abstract guarantees into “can my code see X?”:

Anomaly a reader could observe	Strong	Bounded Staleness	Session	Consistent Prefix	Eventual
Stale read (misses latest write)	Never	Up to the window	Never in-session; possible cross-session	Possible	Possible
Out-of-order writes (see B before A)	Never	Never	Never in-session	Never	Possible
Non-monotonic reads (go backward in time)	Never	Never	Never in-session	Never	Possible
Read-your-own-writes fails	Never	Never (in-region strong)	Only without the token	Possible cross-session	Possible
Lag quantified / bounded	N/A (0)	Yes (K / T)	No	No	No

What each level costs and fixes, so the choice is an engineering decision not a vibe:

Level	RU cost relative	Latency profile	Availability	Fixes / good for	Risk it carries
Strong	Highest (reads ~2× RU)	Cross-region quorum on read	Lowest (no multi-write)	Single-region linearizable reads	Cannot do multi-write at all
Bounded Staleness	High	In-region = strong; cross-region bounded	High	Contractual freshness SLA	Min window forced (100000/300)
Session	Low (default)	Local, fast	High	Per-user apps with token flow	Cross-session reads can miss
Consistent Prefix	Low	Local, fast	High	Ordered feeds, no recency need	No recency bound at all
Eventual	Lowest	Local, fastest	Highest	Counters, telemetry, idempotent	Out-of-order reads

3. Bounded staleness vs session: choosing per workload

For multi-region writes, the two levels worth most of your attention are Bounded Staleness and Session, because they cover the majority of real requirements without paying full latency cost.

Bounded Staleness gives you a quantified staleness budget. You configure a maximum lag as both a version count (maxStalenessPrefix) and a time window (maxIntervalInSeconds); reads in any region are guaranteed to be no more stale than the tighter of the two. This is the level you want when you need a contractual freshness bound you can put in an SLA: “replicas are never more than 5 minutes behind.” For a multi-region-write account spanning two-plus regions, the minimums are maxStalenessPrefix >= 100000 and maxIntervalInSeconds >= 300. Inside a single region it still behaves like strong consistency, which is a useful property: clients pinned to one region get read-your-writes for free.

# Set Bounded Staleness with the multi-region minimums
az cosmosdb update --name kv-cosmos-prod --resource-group rg-data-prod \
  --default-consistency-level BoundedStaleness \
  --max-staleness-prefix 100000 \
  --max-interval 300

Session is the pragmatic default for most applications, and it is the actual Cosmos DB default. It guarantees consistency within a session — typically one user’s connection — via a session token (x-ms-session-token). The same client that wrote a document will read it back; it gets monotonic reads and writes. The catch is that the guarantee is scoped to the session token. If request A writes in East US 2 and request B (a different client, different token) reads in West Europe a few milliseconds later, B can miss the write. To preserve read-your-writes across tiers, you must flow the session token between services.

// Write returns a session token; capture and propagate it
var write = await container.CreateItemAsync(order, new PartitionKey(order.TenantId));
string sessionToken = write.Headers.Session; // pass to downstream via header/cookie

// A later read in another tier honors that token -> read-your-writes preserved
var read = await container.ReadItemAsync<Order>(
    order.Id, new PartitionKey(order.TenantId),
    new ItemRequestOptions { SessionToken = sessionToken });

The two levels head-to-head on the properties you actually choose between:

Property	Bounded Staleness	Session
Scope of guarantee	Global, every reader	Per session token only
Read-your-writes	Yes, globally within the window; strong in-region	Yes, only if the token is carried
Freshness bound	Quantified (K versions / T seconds)	None across sessions
Minimums (multi-write)	`prefix>=100000`, `interval>=300`	none
RU cost	Higher	Lowest (default)
Best for	Multiple independent readers; SLA freshness	Per-user app you control end to end
Failure mode	Reads up to the window stale	Cross-session reader misses recent write
Token plumbing required	No	Yes (header/cookie across tiers)

The Bounded Staleness window parameters in detail — both bounds apply, the tighter one wins:

Parameter	Meaning	Multi-region minimum	Effect of decreasing it	Effect of increasing it
`maxStalenessPrefix`	Max number of versions a read can lag	100000 ops	Tighter freshness, more cross-region coordination	Looser freshness, cheaper, larger RPO
`maxIntervalInSeconds`	Max wall-clock lag	300 s	Tighter freshness, higher latency/availability cost	Looser freshness, larger RPO window
(single-region account)	Same params, smaller floors	10 ops / 5 s	n/a	n/a

Rule of thumb I apply, as a decision table:

If the workload is…	And readers…	Pick	Because
Per-user (cart, profile, session)	are the same user, token flows	Session	Cheapest correct read-your-writes
Multi-reader (dashboards, cache warmers)	cannot carry a session token	Bounded Staleness	Global bounded freshness without tokens
Needs a freshness SLA	external consumers read it	Bounded Staleness	You can promise “≤5 min stale”
Tolerant (counters, telemetry, feeds)	reconcile out of band	Consistent Prefix / Eventual	Lowest latency, highest availability
Must be linearizable	single region only	Strong	Only if you give up multi-write

4. Conflict types under multi-region writes

With multiple write regions, two clients can mutate the same document (same id + partition key) concurrently in different regions. When replication brings those versions together, Cosmos DB detects a conflict. There are three kinds, and how each surfaces depends entirely on the conflict-resolution policy you set on the container.

The three conflict types and how each behaves under each policy:

Conflict type	What happened	Under LWW	Under Custom (sproc)	Under Custom (manual)
Insert	Two regions create a doc with the same `id`+PK	Higher path value committed; loser discarded silently	Sproc receives both; decides winner/merge	Both land in the conflicts feed
Replace / update	Two regions update the same existing doc concurrently	Higher path value wins; loser discarded silently	Sproc receives incoming + existing + feed	Losers land in the conflicts feed
Delete	One region deletes a doc another region updates	Resolved by path; delete may win or lose	Sproc gets `isTombstone=true` to decide	Both versions surface in the feed

How a conflict surfaces depends entirely on the policy:

Last-Writer-Wins (LWW) — the default. Cosmos resolves conflicts automatically and silently using a numeric path (default _ts). The winner is committed; losers are discarded and never appear in the conflicts feed.
Custom (stored procedure) — your registered sproc resolves each conflict.
Custom (manual / no sproc) — Cosmos does not auto-resolve. Conflicting versions are written to a conflicts feed and your application must read it and resolve them.

The three policies compared on the properties that decide which one your domain needs:

Property	LWW (default)	Custom — stored procedure	Custom — manual feed
Who resolves	Cosmos, automatically	Your JS sproc, automatically	Your app, on its own schedule
Losers visible?	No (silently dropped)	Only if sproc routes them	Yes (in the conflicts feed)
Can it merge versions?	No (winner-takes-all)	Yes	Yes
Business rules possible?	No	Yes	Yes
Operational burden	Lowest	Medium (write + monitor sproc)	Highest (build + run a drainer)
Failure safety net	None	Sproc failure → routed to feed	Feed is the mechanism
Right for	Tolerant, last-write-truly-wins data	Ordered state, mergeable docs	Maximum control, audit-heavy domains
Latency on resolution	Inline, invisible	Inline, invisible	Deferred until drained

You set the policy at container creation. It cannot be changed after creation through most SDKs/portal, so choose deliberately — switching strategy generally means a new container and a migration. The immutability is the single most important fact in this article: the conflict-resolution policy is a data-model decision you make once.

5. Last-writer-wins with a custom path property

The default LWW policy resolves on the system property _ts (last-modified timestamp, second granularity). Second granularity is coarse: two writes in the same second tie, and Cosmos picks deterministically but not in a way you control. For correctness you often want LWW over a property you own — a monotonic version number, an epoch-millis timestamp, or a sequence assigned by your write path.

# Create a container with LWW resolving on a custom numeric path
az cosmosdb sql container create \
  --account-name kv-cosmos-prod \
  --resource-group rg-data-prod \
  --database-name shop \
  --name orders \
  --partition-key-path "/tenantId" \
  --conflict-resolution-policy-mode "LastWriterWins" \
  --conflict-resolution-policy-path "/version"

The path must point to a numeric field; the document with the higher value wins. Keep these invariants or LWW will silently lose data:

The path is always present and numeric on every write. A missing path is treated as 0.
The value is monotonically increasing per logical document. If you use client clocks, skew between regions becomes data loss — prefer a value you can guarantee increases (a version counter incremented on read-modify-write, or a hybrid logical clock).
Ties resolve deterministically but arbitrarily. Make the value unique enough to avoid ties on writes you care about.

The LWW path options ranked from worst to best for correctness:

Path choice	Granularity	Monotonic across regions?	Data-loss risk	Verdict
`_ts` (default)	1 second	Server-set, but ties in a second	High in ordered domains	Avoid for stateful/ordered data
Client wall-clock millis	1 ms	No — clock skew between regions	High (skew = lost writes)	Never; skew silently loses writes
Epoch millis from a single clock	1 ms	Only if one clock issues them	Medium	OK if you truly have one clock source
Per-doc version counter (RMW)	Per write	Yes, if increment is correct	Low	Good — the common correct choice
Hybrid logical clock (HLC)	Logical+physical	Yes, by construction	Lowest	Best for true causal ordering

What “treated as 0” and “higher wins” mean for real edge cases:

Scenario	`/version` values	LWW outcome	Is it what you want?
Normal update	existing 7, incoming 8	incoming (8) wins	Yes
Missing path on one write	existing 7, incoming absent (=0)	existing (7) wins	Usually yes — but a real write with no version is a bug
Both absent	0 vs 0	deterministic-but-arbitrary	Dangerous — make version mandatory
Stale retry	existing 9, incoming 5	existing (9) wins	Yes — old retry correctly loses
Tie	8 vs 8	one wins arbitrarily	Only safe if 8==8 truly means “same”

Equivalent in Bicep, which is where this belongs for reproducibility:

resource ordersContainer 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2024-11-15' = {
  parent: shopDatabase
  name: 'orders'
  properties: {
    resource: {
      id: 'orders'
      partitionKey: { paths: [ '/tenantId' ], kind: 'Hash' }
      conflictResolutionPolicy: {
        mode: 'LastWriterWins'
        conflictResolutionPath: '/version'
      }
    }
  }
}

6. Custom conflict resolution via stored procedure and the conflicts feed

When LWW is too blunt — you need to merge concurrent edits, or apply business rules about which write wins — switch to custom resolution. There are two flavors.

6a. Stored-procedure resolution

You register a JavaScript sproc as the resolver. On every conflict Cosmos invokes it with the incoming document, the existing committed document, a tombstone flag, and any documents already in the conflicts feed. Your sproc decides the final state and writes it. The sproc signature is fixed:

// resolver sproc: merges line items, keeps the max status rank
function resolver(incomingItem, existingItem, isTombstone, conflictingItems) {
  var collection = getContext().getCollection();
  var response = getContext().getResponse();

  // isTombstone === true means the incoming op was a delete
  var resolved = existingItem || {};
  if (incomingItem) {
    resolved.lineItems = mergeById(
      (existingItem && existingItem.lineItems) || [],
      incomingItem.lineItems || []);
    resolved.status = Math.max(
      (existingItem && existingItem.status) || 0,
      incomingItem.status || 0);
    resolved.id = incomingItem.id;
  }

  // Conflicting versions sitting in the feed must be folded in too
  (conflictingItems || []).forEach(function (c) {
    resolved.lineItems = mergeById(resolved.lineItems, c.lineItems || []);
    resolved.status = Math.max(resolved.status, c.status || 0);
  });

  var docLink = collection.getSelfLink() + 'docs/' + resolved.id;
  if (isTombstone && (!incomingItem)) {
    collection.deleteDocument(docLink, {}, function (e) { if (e) throw e; });
  } else {
    collection.upsertDocument(collection.getSelfLink(), resolved,
      function (e) { if (e) throw e; });
  }
  response.setBody(resolved);

  function mergeById(a, b) { /* union by line id, prefer higher qty */
    var m = {};
    a.concat(b).forEach(function (x) {
      if (!m[x.id] || x.qty > m[x.id].qty) m[x.id] = x;
    });
    return Object.keys(m).map(function (k) { return m[k]; });
  }
}

The four arguments Cosmos passes the resolver, and what each is for:

Argument	Type	What it carries	Watch-out
`incomingItem`	object / null	The newly replicated version causing the conflict	Null when the incoming op was a delete
`existingItem`	object / null	The currently committed version in this region	Null on an insert-insert conflict
`isTombstone`	boolean	True if the incoming operation was a delete	Decide delete-wins vs update-wins explicitly
`conflictingItems`	array	Versions already sitting in the feed for this doc	Must fold these in or you lose them

# 1) Register the sproc in the container
az cosmosdb sql stored-procedure create \
  --account-name kv-cosmos-prod \
  --resource-group rg-data-prod \
  --database-name shop \
  --container-name orders \
  --name resolver \
  --body @resolver.js

# 2) Create the container pointing its policy at that sproc
az cosmosdb sql container create \
  --account-name kv-cosmos-prod --resource-group rg-data-prod \
  --database-name shop --name orders \
  --partition-key-path "/tenantId" \
  --conflict-resolution-policy-mode "Custom" \
  --conflict-resolution-procedure "dbs/shop/colls/orders/sprocs/resolver"

Key constraints on the resolver sproc, each with the consequence of getting it wrong:

Constraint	Why it exists	Consequence if violated	How to satisfy it
Scoped to one partition key per invocation	Sprocs run within a single logical partition	Cannot resolve cross-partition conflicts	Keep conflicts within a partition by design
Must be deterministic	Cosmos may invoke it more than once	Divergent regional state	Same inputs → same output, always
Must be idempotent	Re-invocation must be safe	Double-applied merges, drift	Resolve to an absolute state, not a delta
Failure → routed to conflicts feed	Safety net, not a happy path	Silent divergence if you don’t monitor	Alert on feed depth; treat throws as incidents
Bound at container creation	Policy is immutable	Can’t swap strategy in place	New container + migration to change

6b. Manual resolution via the conflicts feed

Set the policy to Custom with no resolver procedure. Now Cosmos writes every conflicting version to the per-container conflicts feed and stops. Your application drains it and resolves on its own terms.

# Custom policy with NO sproc => manual feed resolution
az cosmosdb sql container create \
  --account-name kv-cosmos-prod --resource-group rg-data-prod \
  --database-name shop --name ledger \
  --partition-key-path "/accountId" \
  --conflict-resolution-policy-mode "Custom"

// Drain the conflicts feed and resolve in application code
using var iterator = container.Conflicts.GetConflictQueryIterator<ConflictProperties>();
while (iterator.HasMoreResults)
{
    foreach (var conflict in await iterator.ReadNextAsync())
    {
        // The losing version that landed in the feed
        Order conflicting = container.Conflicts.ReadConflictContent<Order>(conflict);
        // The currently committed version
        Order committed = await container.ReadItemAsync<Order>(
            conflicting.Id, new PartitionKey(conflicting.TenantId));

        Order winner = Merge(committed, conflicting); // your business rule
        await container.ReplaceItemAsync(winner, winner.Id, new PartitionKey(winner.TenantId));

        // Delete the entry from the feed once handled
        await container.Conflicts.DeleteAsync(conflict, new PartitionKey(conflicting.TenantId));
    }
}

Manual mode is the most flexible and the most operationally demanding: if nobody drains the feed, conflicts accumulate and your data quietly diverges from what users expect. Run the drainer as a continuously scheduled job and alert if the feed depth grows. The operational obligations of manual mode, in order of how often they are missed:

Obligation	Why it matters	If skipped	How to meet it
A running drainer	Feed doesn’t drain itself	Divergence accumulates forever	Continuous Function / worker on a timer
Idempotent merge logic	Drainer may reprocess entries	Double-applied resolutions	Resolve to absolute state; delete after handling
Delete after resolving	Entries persist until removed	Feed grows unbounded	`Conflicts.DeleteAsync` per handled entry
Depth alerting	Silent backlog is invisible	Stale data, no signal	Alert on conflict activity / feed depth
Per-partition scoping	Feed is per container/partition	Missed conflicts in other partitions	Iterate all partitions or use feed ranges

How to host the drainer, with the trade-offs of each option:

Host	Trigger	Scaling	Cost	When to choose
Azure Function (timer)	Cron (e.g. every 1 min)	Consumption/Flex auto	Lowest; pay per run	Default for most teams
Azure Function (Cosmos trigger)	Change feed	Lease-based parallelism	Low	When you already process the change feed
Container App job	Scheduled / KEDA	KEDA queue/cron	Low–medium	Already on Container Apps
AKS CronJob	Kubernetes cron	Pod replicas	Medium	Already on AKS
Always-on worker (App Service)	Continuous loop	Manual instances	Medium	Need sub-second drain latency

7. Automatic vs manual failover and testing outages

Two independent settings govern regional failover:

enableAutomaticFailover — if the write region (under single-write) becomes unavailable, Cosmos promotes the next region by failoverPriority. With multi-region writes on, this is largely moot for writes because every region already writes; the SDK simply stops routing to the down region. Keep it on regardless.
Service-managed vs manual failover for reads/priority — you can trigger a manual failover to validate behavior or to drain a region for maintenance.

How the two failover modes differ in practice:

Aspect	Automatic (service-managed) failover	Manual failover
Trigger	Cosmos detects region unavailability	You run `failover-priority-change`
Use case	Real outages, unattended	Rehearsals, planned maintenance drains
Write impact (single-write)	Promotes next priority region	You choose the new priority 0
Write impact (multi-write)	None — all regions already write	Reorders priority only
Risk	None to enable; recommended	Re-prioritizes for real — use a window
Data loss	Up to RPO of the consistency level	Same; rehearse to measure it

Trigger a controlled failover to rehearse an outage. This actually reprioritizes regions; run it in a test account or a planned window:

# Promote West Europe to priority 0 (simulate losing East US 2 as primary)
az cosmosdb failover-priority-change \
  --name kv-cosmos-prod \
  --resource-group rg-data-prod \
  --failover-policies "West Europe=0" "East US 2=1" "Southeast Asia=2"

On the client side, your CosmosClient should be configured with an explicit preferred-regions list so it fails over locally without a config change:

var client = new CosmosClient(connectionString, new CosmosClientOptions
{
    ApplicationPreferredRegions = new List<string>
    {
        "East US 2", "West Europe", "Southeast Asia"  // ordered preference
    },
    ConnectionMode = ConnectionMode.Direct
});

With ApplicationPreferredRegions set, the SDK automatically retries the next region on a regional failure — you do not redeploy to fail over. The client-side knobs that make failover transparent:

Client option	What it does	Default	Set it to	Why
`ApplicationPreferredRegions`	Ordered region preference for routing/retry	account default order	Your latency-ordered region list	Local failover with no redeploy
`ApplicationRegion`	Single preferred region (older API)	none	Prefer `PreferredRegions` instead	List allows ordered fallback
`ConnectionMode`	Direct (TCP) vs Gateway (HTTPS)	Direct (SDK v3)	Direct for lowest latency	Fewer hops; honors region routing
`ConsistencyLevel` (client)	Relax account default per client	account default	Only to relax	Cheaper reads where tolerable
`MaxRetryAttemptsOnRateLimited...`	Throttle retry behavior	SDK default	Tune for 429 storms	Smooths transient throttling

Test this for real: block egress to the primary region’s Cosmos endpoint (NSG rule or local firewall) and confirm your service keeps serving from the next region within the SDK’s retry window.

The three regional topologies side by side — pick the cheapest one that meets your write locality need, not your read need:

Property	Single-write + replicas	Multi-write (2 regions)	Multi-write (3+ regions)
Who accepts writes	One primary only	Both regions	All regions
Write latency (far users)	Cross-region round trip	Local in either region	Local everywhere
Conflicts possible	No	Yes	Yes (more likely)
Strong consistency	Allowed	No	No
Write RU cost	1× write + read RU	~2× write RU	~N× write RU
RTO (writes)	Promotion time	≈0	≈0
Best for	Global reads, single writer	Two-continent writes	True global active-active
Operational complexity	Lowest	Medium (conflicts)	Highest (conflicts + cost)

8. Validating RPO/RTO and monitoring replication latency

Numbers you should be able to quote for a multi-region-write account:

RTO is effectively near-zero for writes under multi-region writes, because every region is already a write region; there is no promotion step on the write path.
RPO depends on consistency level. Under Strong RPO is 0 — but Strong is unavailable with multi-region writes. Under Bounded Staleness, RPO is bounded by your configured staleness window. Under Session/Consistent Prefix/Eventual, RPO is non-zero and unbounded in the worst case. This is the core trade-off: multi-region writes buy you RTO at the cost of a non-zero RPO.

RPO/RTO by configuration, the table you put in the DR runbook:

Configuration	RTO (writes)	RTO (reads)	RPO	Notes
Single-write, Strong	Failover promotion time	~0 (other replicas)	0	Linearizable; no multi-write
Single-write, Bounded Staleness	Failover promotion time	~0	≤ staleness window	Common single-write DR posture
Multi-write, Bounded Staleness	≈0 (all write)	~0	≤ staleness window	Best bounded-RPO active-active
Multi-write, Session	≈0	~0	Non-zero, unbounded worst case	Cheapest; per-user correctness only
Multi-write, Consistent Prefix	≈0	~0	Non-zero, unbounded	Ordered feeds
Multi-write, Eventual	≈0	~0	Non-zero, unbounded	Most available, least fresh

Monitor replication latency continuously. The relevant metric is Replication Latency (P50/P99 by source/target region) in Azure Monitor:

// P99 cross-region replication latency, by region pair, last 6h
AzureMetrics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB"
| where MetricName == "ReplicationLatency"
| where TimeGenerated > ago(6h)
| summarize p99 = percentile(Average, 99) by bin(TimeGenerated, 5m), Resource
| order by TimeGenerated desc

Also alert on the conflict path so silent divergence cannot hide:

// Surfacing custom/manual conflict activity
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DOCUMENTDB"
| where Category == "DataPlaneRequests"
| where OperationName has "Conflict"
| summarize count() by bin(TimeGenerated, 15m), requestResourceType_s

The signals worth wiring as alerts — leading indicators, not lagging “users complained”:

Signal	Metric / source	Starting threshold	Why it’s leading
Replication lag	`ReplicationLatency` P99 by region	> your RPO budget	Predicts data-at-risk before a region loss
Conflict activity	`DataPlaneRequests` conflict ops	any sustained > 0 in manual mode	Divergence is happening now
Conflicts-feed depth	App-emitted gauge from the drainer	> 0 for 5 min	Nobody is reconciling
Throttling (429)	`TotalRequestUnits` / 429 rate	> 1% throttled	Multi-write amplifies write RU
Region availability	Service Health / `ServiceAvailability`	any region degraded	Triggers the RPO clock
Provisioned vs used RU	`ProvisionedThroughput` vs `TotalRequestUnits`	sustained > 80%	Multi-write writes cost N×

Architecture at a glance

The diagram traces a write as it actually flows through a multi-region-write account, then maps each place data can diverge or be lost as a numbered badge. Read it left to right. On the far left, the App + SDK issues a write with an ordered ApplicationPreferredRegions list and (under Session) a session token; multiple writers can target the same id + partition key from different regions. The write hits the account gateway on :443, which routes it to the nearest write region — and the consistency knob here is where badge 1 lives: you cannot select Strong on this path, only Bounded Staleness, Session, Consistent Prefix or Eventual. The middle zone is the heart of multi-write: East US 2, West Europe and Southeast Asia each commit and ACK locally (badge 2 marks West Europe accepting a concurrent edit to a document East US 2 just changed). From there the replication zone ships those local commits asynchronously; badge 3 sits on the replication hop because whatever has not yet replicated when a region is lost is exactly your RPO. When two live versions of one document meet, the detect-clash node fires, and the flow turns into the resolution zone.

The resolution zone is the design decision the whole article is about. The LWW path node (badge 4) resolves on a numeric property — and the warning is that the default /version choice of _ts ties at one-second granularity and drops a real write silently. The sproc / feed node (badge 5) is the deterministic alternative: a custom resolver that merges or applies business rules, or a manual conflicts feed your app must drain and alert on. The legend narrates each number as symptom · how to confirm · fix — read the badge, run the named az/Azure Monitor confirm step, apply the fix. The single sentence to carry away from the picture: the request path buys you write locality and near-zero RTO, and every badge is a place you pay for it in consistency, RPO, or conflict-resolution correctness.

Real-world scenario

Aurelia Pay, a fictional global payments platform, ran a payments-ledger container with three write regions (East US 2, West Europe, Southeast Asia) at Session consistency to meet a sub-50 ms write SLO across the Americas, EU and APAC. Their idempotency layer keyed on a client-supplied paymentId, and the write path did a read-modify-write to advance a status field (0=pending, 1=authorized, 2=captured, 3=refunded). The container used the default LWW on _ts. The platform team was six engineers; the Cosmos spend was about ₹240,000/month (three write regions multiply the write RU).

The incident began during a partial network partition between East US 2 and West Europe — a real BGP event lasting about nine minutes. A retrying client authorized a payment in West Europe while a parallel capture landed in East US 2 for the same paymentId. Both committed locally and ACK’d; the partition kept them apart. When replication healed, the two versions met and Cosmos resolved the conflict on _ts. Because both writes fell in the same second, _ts tied, Cosmos kept the authorize as the winner, and the capture was silently discarded — money had moved, the ledger said “authorized.” Nothing surfaced in any feed (LWW never populates it). They found it 31 hours later when the daily reconciliation against the processor disagreed by a five-figure sum.

The breakthrough was framing the bug correctly. This was not a Cosmos defect and not an application race they could lock away — under multi-region writes, concurrent same-document edits across regions are expected. The defect was the resolution policy: resolving an ordered state machine on a timestamp. The constraint made it harder: they could not tolerate any state regression, and they could not drop to single-region writes (the APAC latency SLO would break). The fix was a custom resolver sproc that resolves on the business state machine instead of a timestamp — the higher status rank always wins, and a refund (3) is terminal (absorbing):

function resolveLedger(incoming, existing, isTombstone, conflicts) {
  var ctx = getContext(), coll = ctx.getCollection(), res = ctx.getResponse();
  var all = [existing, incoming].concat(conflicts || []).filter(Boolean);
  // Terminal states win; otherwise the highest status rank wins.
  var winner = all.reduce(function (best, c) {
    if (best === null) return c;
    if (c.status === 3) return c;            // refund is absorbing
    return (c.status > best.status) ? c : best;
  }, null);
  coll.upsertDocument(coll.getSelfLink(), winner, function (e) { if (e) throw e; });
  res.setBody(winner);
}

They also moved the LWW-style fields they could safely auto-merge (audit tags, lastTouchedBy) into the same sproc so nothing fell back to _ts, and they switched the policy to Custom so a sproc failure would route to the conflicts feed rather than silently dropping a write. Post-change, a six-month reconciliation run showed zero ledger regressions. The conflicts-feed alert (depth > 0 for more than five minutes) plus a ReplicationLatency P99 alert gave them the early-warning signals they had been missing entirely. The before/after, because the contrast is the lesson:

Dimension	Before (default LWW on `_ts`)	After (custom resolver sproc)
Resolution basis	Last-modified timestamp, 1 s granularity	Business `status` rank, refund absorbing
Same-second conflict	Tie → arbitrary winner, capture lost	Higher status wins deterministically
Loser visibility	None (LWW never populates feed)	Sproc folds all versions; failures → feed
Reconciliation result	Five-figure mismatch after 31 h	Zero regressions over six months
Detection signal	Out-of-band daily reconciliation	Feed-depth + replication-latency alerts
Write SLO (APAC)	Met (Session, multi-write)	Still met — no topology change
Cost	₹240,000/mo (3 write regions)	Unchanged; the fix was the policy

The line the team wrote into their design guide: on a multi-region-write account, the conflict-resolution policy is part of your data model, not an afterthought — and default LWW on _ts is almost never correct for stateful, ordered domains.

Advantages and disadvantages

Multi-region writes both unlock global low-latency write workloads and introduce the distributed-systems tax. Weigh it honestly:

Advantages (why you reach for it)	Disadvantages (why it bites)
Local write latency everywhere — nearest region ACKs, no cross-ocean write round trip	Concurrent same-doc edits across regions are now possible; you must resolve conflicts
RTO for writes ≈ 0 — every region already writes, no promotion step on a region loss	RPO is non-zero for every multi-write level; only Bounded Staleness caps it
Higher write availability — a region loss doesn’t zero out write capability	Strong consistency is off the table entirely; you lose linearizability
Pluggable resolution (LWW path, sproc, manual feed) fits ordered or mergeable domains	The policy is set at container creation and effectively immutable — a wrong choice means a migration
Built-in conflicts feed is a safety net for custom/sproc failures	Manual mode silently diverges if nobody drains the feed
Session consistency keeps per-user read-your-writes cheap	Cross-session reads can miss recent writes unless you flow the token
Bounded Staleness gives a contractual freshness SLA you can promise	Enforced minimums (100000 ops / 300 s) may be looser than you’d like
Scales globally without app-level sharding of the write path	Provisioned write RU cost ≈ N× the number of write regions

The model is right for globally distributed, write-active, latency-sensitive workloads where the data is either tolerant (telemetry, counters, sessions) or has a resolvable conflict story (an ordered state machine you can rank, or documents you can merge). It is wrong when you need true linearizability (use single-write Strong), when the data has no sane merge and any loss is unacceptable without heavy custom work, or when only one region actually writes (then you want read replicas, not the N× write-RU bill). The disadvantages are all manageable — but only if you treat consistency and conflict resolution as first-class design, which is the entire point of this article.

Hands-on lab

Reproduce a conflict deterministically, watch LWW-on-_ts drop a write, then switch to LWW-on-/version and confirm the correct version wins — all on a single account (we add a second region briefly; delete at the end to stop the RU/region cost). Run in Cloud Shell (Bash) unless noted.

Step 1 — Variables and resource group.

RG=rg-cosmos-lab
ACC=kvcosmoslab$RANDOM        # globally-unique account name
LOC1=eastus2
LOC2=westeurope
az group create -n $RG -l $LOC1 -o table

Step 2 — Create a single-region account at Session consistency.

az cosmosdb create -n $ACC -g $RG \
  --locations regionName=$LOC1 failoverPriority=0 isZoneRedundant=false \
  --default-consistency-level Session -o table

Expected: an account row; enableMultipleWriteLocations defaults to false.

Step 3 — Add a second region and enable multi-region writes.

az cosmosdb update -n $ACC -g $RG \
  --locations regionName=$LOC1 failoverPriority=0 isZoneRedundant=false \
  --locations regionName=$LOC2 failoverPriority=1 isZoneRedundant=false
az cosmosdb update -n $ACC -g $RG --enable-multiple-write-locations true
az cosmosdb show -n $ACC -g $RG --query "enableMultipleWriteLocations" -o tsv

Expected: the final command prints true, and writeLocations now lists both regions.

Step 4 — Create a DB and two containers: one default-LWW, one LWW-on-/version.

az cosmosdb sql database create -a $ACC -g $RG -n shop -o table

# Container A: default LWW (resolves on _ts)
az cosmosdb sql container create -a $ACC -g $RG -d shop -n orders_ts \
  --partition-key-path "/tenantId" --throughput 400 -o table

# Container B: LWW on a numeric /version you own
az cosmosdb sql container create -a $ACC -g $RG -d shop -n orders_ver \
  --partition-key-path "/tenantId" \
  --conflict-resolution-policy-mode LastWriterWins \
  --conflict-resolution-policy-path "/version" --throughput 400 -o table

Step 5 — Inspect the bound policy on each container (this is the verification that matters).

az cosmosdb sql container show -a $ACC -g $RG -d shop -n orders_ts \
  --query "resource.conflictResolutionPolicy" -o json
az cosmosdb sql container show -a $ACC -g $RG -d shop -n orders_ver \
  --query "resource.conflictResolutionPolicy" -o json

Expected: orders_ts shows "conflictResolutionPath": "/_ts" (the default); orders_ver shows "conflictResolutionPath": "/version". This single difference is the whole lesson — the ordered-domain container must not resolve on _ts.

Step 6 — Confirm consistency and write topology, the production gate.

az cosmosdb show -n $ACC -g $RG \
  --query "{multiWrite:enableMultipleWriteLocations, consistency:consistencyPolicy.defaultConsistencyLevel, writeRegions:writeLocations[].locationName}" -o json

Expected: multiWrite: true, consistency: Session, and both regions under writeRegions. (To observe a real cross-region conflict resolve you would write the same id+tenantId to each region during a simulated partition — the Cosmos emulator’s multi-region mode or a brief endpoint block lets you do this; on a single live account the policy inspection in Step 5 is the deterministic check.)

The lab steps mapped to what each proves:

Step	What you did	What it proves	Real-world analogue
3	Enable multi-write on a 2-region account	Every region becomes a write region	The decision that introduces conflicts
4	Two containers, two LWW paths	The policy is per-container and set at creation	Choosing the policy as a data-model decision
5	Inspect `conflictResolutionPolicy`	`_ts` default vs `/version` is visible and real	The 90-second “is this safe?” check
6	Confirm multi-write + consistency	The production gate before go-live	Pre-prod sign-off

Cleanup (stop the per-region RU cost):

az group delete -n $RG --yes --no-wait

Cost note. Two 400-RU/s containers across two write regions for under an hour is a few rupees; the multi-region multiplier is what you are watching, and deleting the resource group stops all of it. Always delete lab accounts — an idle multi-region account still bills provisioned RU in every region.

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First as a scannable table you can read mid-incident, then the entries that bite hardest expanded with the full confirm detail.

#	Symptom	Root cause	Confirm (exact cmd / portal path)	Fix
1	Reconciliation finds missing updates; no errors anywhere	Default LWW on `_ts` dropped a write on a same-second tie	`az cosmosdb sql container show --query resource.conflictResolutionPolicy` shows `/_ts`	New container with LWW on `/version`, or a custom sproc
2	`enable-multiple-write-locations true` is rejected	Account is at Strong consistency	`az cosmosdb show --query consistencyPolicy.defaultConsistencyLevel` = Strong	Set Bounded Staleness/Session first, then enable multi-write
3	Setting Bounded Staleness fails on a multi-region account	Window below the multi-region floor	Error cites `maxStalenessPrefix`/`maxIntervalInSeconds`	`--max-staleness-prefix 100000 --max-interval 300` (or larger)
4	Data quietly diverges between regions over days	Custom (no sproc) feed nobody drains	`Conflicts.GetConflictQueryIterator` returns entries; depth alert never built	Build a continuous drainer + a feed-depth alert
5	Cross-session reader in region B misses a write from region A	Session consistency, token not flowed	App tiers don’t pass `x-ms-session-token`	Propagate the session token, or use Bounded Staleness
6	LWW “loses” the newer write under clock skew	LWW path is client wall-clock time	Path is a client timestamp; regions’ clocks differ	Use a monotonic per-doc version (RMW) or HLC
7	Sproc resolver produces different state in different regions	Non-deterministic / non-idempotent resolver	Resolver reads `Date.now()`/random; outputs differ	Make it deterministic + idempotent (absolute state)
8	Can’t change the conflict policy on a live container	Policy is immutable after creation	`az cosmosdb sql container show` policy is fixed	New container + change-feed migration, cut over behind a flag
9	Writes throttle (429) after enabling multi-write	Write RU is now N× across regions	`TotalRequestUnits` high; 429 rate up	Raise provisioned/autoscale RU, or reduce write regions
10	Client keeps hitting a down region after failover	No `ApplicationPreferredRegions` set	SDK options lack the ordered region list	Set `ApplicationPreferredRegions`; Direct mode
11	“Multi-write is on” but one region won’t accept writes	A region is still a read replica (toggle not applied)	`writeLocations` lacks that region	Re-run `--enable-multiple-write-locations true`; verify
12	RPO bigger than expected after a region loss	Consistency is Session/Eventual (unbounded RPO)	`consistencyPolicy` not Bounded Staleness	Move to Bounded Staleness to cap the lag window
13	Deletes “come back” after replication	Delete lost the conflict to a concurrent update	Doc reappears; LWW path favored the update	Decide delete-wins in a sproc (`isTombstone`)
14	Cassandra/Mongo API: no custom resolver available	Custom sproc/feed is NoSQL-API only	Wrong API for pluggable resolution	Use NoSQL API, or accept LWW on those APIs

The expanded form for the entries that bite hardest:

1. Reconciliation finds missing updates; nothing errored. Root cause: Default LWW on _ts resolved a conflict on a one-second timestamp tie and silently discarded a real write; LWW never populates the conflicts feed, so there is no error trail. Confirm: az cosmosdb sql container show --account-name <acc> -g <rg> -d <db> -n <container> --query "resource.conflictResolutionPolicy" shows mode: LastWriterWins and conflictResolutionPath: /_ts. Fix: For ordered/stateful data, create a new container with LWW on a monotonic /version you own, or a custom resolver sproc that ranks on business state. Migrate via the change feed; the policy can’t be changed in place.

2. Enabling multi-region writes is rejected. Root cause: The account is at Strong consistency, which is incompatible with multi-region writes (linearizability needs one global order). Confirm: az cosmosdb show -n <acc> -g <rg> --query "consistencyPolicy.defaultConsistencyLevel" returns Strong. Fix: Lower to Bounded Staleness (or Session) first — az cosmosdb update --default-consistency-level BoundedStaleness --max-staleness-prefix 100000 --max-interval 300 — then --enable-multiple-write-locations true.

3. Setting Bounded Staleness fails on a multi-region account. Root cause: The window is below the multi-region floor (maxStalenessPrefix >= 100000, maxIntervalInSeconds >= 300). Confirm: The CLI error names the parameter that is too small. Fix: Pass values at or above the floor: --max-staleness-prefix 100000 --max-interval 300. Tighter windows are only allowed on single-region accounts.

4. Data quietly diverges between regions over days. Root cause: Custom (no sproc) policy writes conflicting versions to the conflicts feed, and nobody drains it — so divergence accumulates invisibly. Confirm: container.Conflicts.GetConflictQueryIterator<ConflictProperties>() returns entries; you have no alert on feed depth or conflict activity. Fix: Run a continuous drainer (Function/worker) that resolves each entry, deletes it after handling, and emits a depth gauge; alert on depth > 0 sustained.

5. A cross-session reader misses a recent write. Root cause: Session consistency scopes read-your-writes to the session token; a different client/tier reading in another region without the token can miss a just-written value. Confirm: The write tier captures response.Headers.Session but downstream readers don’t pass it back as SessionToken. Fix: Flow the session token across tiers (header/cookie), or move readers that can’t carry it to Bounded Staleness for a global bounded guarantee.

6. LWW loses the newer write under clock skew. Root cause: The LWW path is a client wall-clock timestamp; regional clock skew means the “later” write can carry the smaller number and lose. Confirm: The conflictResolutionPath points at a client-set time field; regions’ clocks differ by more than the conflict window. Fix: Resolve on a monotonic per-document version advanced by read-modify-write, or a hybrid logical clock — never raw client time.

8. Can’t change the conflict policy on a live container. Root cause: The conflict-resolution policy is set at container creation and effectively immutable. Confirm: az cosmosdb sql container show ... --query "resource.conflictResolutionPolicy" shows the old policy and no SDK/portal path changes it. Fix: Create a new container with the right policy, drain the change feed into it with a Function (live backfill), and cut over behind a feature flag — the same pattern as a partition-key change.

9. Writes throttle (429) right after enabling multi-write. Root cause: Write RU is now multiplied across write regions; the provisioned/autoscale ceiling that was fine for one write region is now insufficient. Confirm: TotalRequestUnits climbs and the 429 rate rises; the account shows N write regions. Fix: Raise provisioned or autoscale max RU/s to cover N× write cost, or reduce the number of write regions (keep some as read replicas).

Best practices

Treat consistency and conflict resolution as data-model decisions, reviewed in design, not toggles flipped at deploy. They determine correctness and RPO, not just performance.
Never select Strong if you intend multi-region writes — it is rejected. Default to Bounded Staleness when you need a freshness SLA, Session when the workload is per-user and you control the token.
Never leave an ordered/stateful container on LWW-on-_ts. Use LWW on a monotonic numeric /version you own, or a custom resolver sproc that ranks on business state.
Make LWW paths guaranteed-monotonic — a version counter advanced by read-modify-write, or a hybrid logical clock. Client wall-clock time turns skew into silent data loss.
Write resolver sprocs deterministic and idempotent, resolving to an absolute state (not a delta). Cosmos may invoke them more than once; non-determinism diverges regional state.
Give the conflicts feed an owner in manual mode: a continuously running drainer that deletes entries after handling, plus an alert on feed depth. An undrained feed is invisible divergence.
Flow the session token across tiers wherever read-your-writes matters at Session level — header or cookie — or step up to Bounded Staleness for readers that can’t carry it.
Set the conflict policy at container creation, knowing it’s immutable; budget a change-feed migration if you ever need to change it.
Configure ApplicationPreferredRegions (ordered) on every CosmosClient so a region loss fails over locally with no redeploy; use Direct connection mode.
Keep enableAutomaticFailover on regardless — it’s harmless under multi-write and essential under single-write.
Right-size RU for N× writes before enabling multi-write; autoscale absorbs aggregate spikes but the baseline write cost multiplies per write region.
Rehearse a regional outage (manual failover or endpoint block) and document RPO/RTO per consistency level in the DR runbook; monitor ReplicationLatency P99 and conflict activity with alerts.

Security notes

Use managed identity, not keys, for the data plane. Cosmos supports Microsoft Entra ID RBAC for data operations; assign the Cosmos DB Built-in Data Contributor (or a scoped custom role) to the app’s managed identity instead of distributing account keys. Keys are account-wide and hard to rotate without downtime.
Disable key-based auth where possible. az cosmosdb update --disable-key-based-metadata-write-access true and prefer Entra RBAC for data; if you must keep keys, store them only as Key Vault references and rotate on a schedule (see Azure Key Vault Secret Rotation with Managed Identity).
Lock the network with Private Endpoints. Put a private endpoint in each region’s VNet and set publicNetworkAccess: Disabled so the account isn’t reachable from the internet; multi-region means one private endpoint per region plus private DNS. See Private Endpoint vs Service Endpoint.
Resolver sprocs run server-side with collection access — treat them as privileged code. Review them in PRs, keep them deterministic, and never embed secrets or external calls (sprocs can’t make outbound calls, but don’t try to smuggle business secrets into them).
Scope RBAC per database/container where the SDK supports it, so a compromised app identity can’t read or mutate unrelated containers — least privilege on the data plane, not just the control plane.
Encrypt with customer-managed keys (CMK) if compliance requires it; data is encrypted at rest by default with Microsoft-managed keys, and CMK lets you hold the key in Key Vault and revoke access.
Audit conflict and data-plane activity to Log Analytics (DataPlaneRequests) so unexpected writes or conflict storms are visible and attributable.

The security controls that also prevent operational incidents — secure and resilient pull the same way here:

Control	Setting / mechanism	Secures against	Also prevents
Entra RBAC data plane	Built-in Data Contributor + managed identity	Account-key sprawl and leakage	Rotation breakage from hard-coded keys
Disable public access	`publicNetworkAccess: Disabled` + private endpoints	Internet-reachable data	Exfiltration over public endpoints
Per-region private endpoint	PE + private DNS per region	Cross-region traffic on the public internet	DNS misresolution sending writes off-region
Scoped RBAC roles	Custom data roles per container	Lateral movement across containers	A bad app touching unrelated data
CMK encryption	Key Vault-held key	Provider-side data exposure	Loss of crypto-shred / revoke capability
Data-plane diagnostics	`DataPlaneRequests` to Log Analytics	Undetected anomalous writes	Silent conflict divergence going unseen

Cost & sizing

The bill drivers and how they interact with multi-region writes:

Provisioned/autoscale RU/s is the dominant cost, and writes multiply by the number of write regions. One write region at 10,000 RU/s costs roughly one unit; three write regions cost roughly three for the write portion, because every write replicates and is billed in each region. This is the single biggest reason to ask “do I need write locality here, or just read locality?”
Read replicas cost RU too, but independently. A region you add as a read replica (not a write region) bills its own RU for reads; you scale it on its own. Two write regions plus a read replica is often cheaper and simpler than three write regions.
Storage is billed per GB per region — every region holds a full copy, so storage also multiplies by region count (read or write).
Autoscale vs manual: autoscale bills 1.5× the equivalent manual rate per RU but scales 10–100% automatically; for spiky multi-region writes it prevents 429 storms but the baseline still multiplies per write region.
Egress / replication traffic between regions is part of the service; the cost you control is RU and storage, plus any cross-region traffic your app generates.

A rough monthly picture (INR, indicative — verify with the Azure pricing calculator for your regions):

Configuration	Write RU model	Storage	Rough INR / month	When it’s the right shape
1 write region, 10k RU/s	1× write RU	1× per GB	~₹85,000	Single-region write, global reads not needed
1 write + 2 read replicas	1× write + read RU	3× per GB	~₹160,000	Global low-latency reads, single writer
2 write regions, 10k RU/s each	~2× write RU	2× per GB	~₹170,000	Two-continent write locality
3 write regions, 10k RU/s each	~3× write RU	3× per GB	~₹255,000	True global active-active writes
Autoscale 10k max, 3 write regions	~3× at 1.5× rate	3× per GB	~₹290,000 peak	Spiky global writes; avoids 429

Right-sizing rules:

If you observe…	It usually means…	Do this
One region writes >> others	You’re paying N× for 1× benefit	Make the quiet regions read replicas
Sustained 429 after multi-write	Write RU ceiling too low for N×	Raise RU or autoscale max
RU far below provisioned but 429s	A hot partition, not a region issue	Fix the partition key (see partition-key article)
Bill dominated by storage	Many regions, large dataset	Trim regions or archive cold data
Bounded Staleness window very tight	Higher coordination latency/cost	Loosen toward the floor (100000/300)

Interview & exam questions

1. Why is Strong consistency incompatible with multi-region writes? Strong guarantees linearizability, which requires a single global ordering of all writes. With multiple regions independently accepting and ACKing writes locally, no single global order exists, so the guarantee cannot hold. Cosmos therefore rejects enabling multi-region writes on a Strong account; you must drop to Bounded Staleness, Session, Consistent Prefix or Eventual first.

2. What does multi-region writes do to your provisioned RU cost, and why? It roughly multiplies the write RU cost by the number of write regions, because every write is committed and replicated in each write region and billed there. The mitigation is to make only the regions that truly need write locality into write regions and keep the rest as read replicas, which scale independently.

3. Default LWW resolves on _ts. Why is that dangerous for an ordered domain? _ts is a last-modified timestamp at one-second granularity; two concurrent writes in the same second tie, and Cosmos keeps one deterministically but arbitrarily, silently discarding the other (LWW never populates the conflicts feed). For a state machine (e.g. payments), this can drop a capture in favor of an authorize. Resolve on a monotonic /version you own or a custom sproc that ranks on business state.

4. Compare Bounded Staleness and Session for a multi-region-write account. Bounded Staleness gives a global, quantified freshness bound (no more than K versions or T seconds stale; minimums 100000 ops / 300 s on multi-region) and behaves like Strong within a region — good for multi-reader and SLA freshness. Session gives read-your-writes only within a session token, is the cheapest level, and is right for per-user workloads where you control token propagation; a different session in another region can miss a recent write.

5. What are the three conflict types, and how does each surface under LWW vs Custom? Insert (two regions create the same id+PK), replace/update (concurrent edits), delete (delete vs concurrent update). Under LWW all three resolve on the numeric path, winner committed, loser discarded silently. Under Custom-sproc your resolver receives the versions (with isTombstone for deletes) and decides. Under Custom-manual the conflicting versions land in the conflicts feed for your app to reconcile.

6. Walk through configuring LWW on a custom path and the invariants you must hold. Create the container with conflictResolutionPolicy.mode = LastWriterWins and conflictResolutionPath = /version. Invariants: the path is always present and numeric (missing = 0), monotonically increasing per document (so a stale retry loses), and unique enough to avoid ties on writes you care about. Prefer a version counter advanced by read-modify-write or a hybrid logical clock over client wall-clock time, which turns clock skew into data loss.

7. What’s the RTO and RPO of a multi-region-write account, and what governs each? RTO for writes is ≈0 because every region already writes — there’s no promotion step on a region loss; the SDK just stops routing to the down region. RPO is non-zero and is governed by the consistency level: Bounded Staleness caps it to the staleness window; Session/Consistent Prefix/Eventual leave it unbounded in the worst case; Strong (RPO 0) is unavailable here. You buy RTO and pay in RPO.

8. A resolver sproc produces different results in different regions. What’s wrong and how do you fix it? The sproc is non-deterministic or non-idempotent — likely reading Date.now(), a random value, or applying a delta rather than computing an absolute state. Cosmos may invoke the resolver more than once and in each region, so identical inputs must yield identical outputs. Fix by making it deterministic and idempotent, resolving to a fully-specified final document, and folding in conflictingItems.

9. You set Custom (no sproc) and data is quietly diverging. What did you forget? The conflicts feed has no owner. In manual mode Cosmos writes conflicting versions to the feed and stops; your application must drain it (read, resolve with a business rule, replace the committed doc, delete the feed entry) on a continuous schedule, and alert on feed depth. Without a drainer, divergence accumulates invisibly.

10. How do you make client failover transparent during a regional outage? Configure CosmosClientOptions.ApplicationPreferredRegions with an ordered region list and use Direct connection mode. On a regional failure the SDK automatically retries the next preferred region without a redeploy. Rehearse it by blocking egress to the primary region’s endpoint or running az cosmosdb failover-priority-change, and confirm the service keeps serving.

11. Can you change a container’s conflict-resolution policy after creation? What’s the migration if not? No — the policy is set at container creation and effectively immutable. To change it you create a new container with the desired policy, backfill via the change feed (an Azure Function draining the old container into the new one live), and cut over behind a feature flag — the same pattern as changing a partition key.

12. Which APIs support custom (sproc/feed) conflict resolution? Only the Cosmos DB for NoSQL API supports pluggable resolution (LWW path, stored-procedure resolver, and the manual conflicts feed). Cassandra, MongoDB and Gremlin APIs typically support LWW only, with no custom resolver — a key reason to choose the NoSQL API when you need ordered/mergeable conflict semantics.

These map to DP-420 (Designing and Implementing Native Applications Using Microsoft Azure Cosmos DB) — consistency, global distribution, conflict resolution, change feed — and touch AZ-305 (Solutions Architect Expert) for the multi-region HA/DR and RPO/RTO design. A compact cert-mapping for revision:

Question theme	Primary cert	Objective area
Consistency levels & trade-offs	DP-420	Design and implement data distribution
Conflict types & resolution policies	DP-420	Implement conflict resolution
LWW path / resolver sprocs / change feed	DP-420	Integrate and optimize; server-side programming
Multi-region HA/DR, RPO/RTO	AZ-305	Design business continuity solutions
RU cost of multi-write, sizing	DP-420 / AZ-305	Optimize cost; design data platform
Entra RBAC, private endpoints, CMK	AZ-305 / AZ-500	Secure the data platform

Quick check

You try to enable multi-region writes and the operation is rejected. What is the single most likely cause, and the one command that confirms it?
A reconciliation job finds a missing update but every log is clean and nothing is in the conflicts feed. What policy is almost certainly in play, and why is the feed empty?
True or false: scaling provisioned RU/s higher is the right fix when writes throttle (429) immediately after you enable multi-region writes.
Your app uses Session consistency. A user’s write in East US 2 isn’t visible to a different service reading in West Europe. Name two valid fixes.
You need to change a container’s conflict-resolution policy from LWW to a custom sproc. Can you do it in place? If not, what’s the migration?

Answers

The account is at Strong consistency, which is incompatible with multi-region writes (linearizability needs one global order). Confirm with az cosmosdb show -n <acc> -g <rg> --query "consistencyPolicy.defaultConsistencyLevel" returning Strong; lower it to Bounded Staleness/Session, then enable multi-write.
Default LWW on _ts. LWW resolves conflicts automatically and discards losers silently — they never appear in the conflicts feed — so a same-second _ts tie can drop a real write with no error trail. Fix with LWW on a monotonic /version or a custom resolver.
Partly true but usually the wrong framing. If the 429s come from the N× write multiplier of multi-region writes, raising provisioned/autoscale RU (or reducing write regions) is correct. But if RU is far below provisioned while one partition 429s, it’s a hot partition — fix the partition key, not the RU.
(a) Flow the session token (x-ms-session-token) from the writing tier to the reading service via header/cookie so read-your-writes is preserved across tiers; or (b) move the cross-region reader to Bounded Staleness for a global, bounded freshness guarantee that doesn’t need a token.
No — the policy is set at container creation and is effectively immutable. Migrate by creating a new container with the custom sproc policy, draining the change feed from the old container into it with a Function (live backfill), and cutting over behind a feature flag.

Glossary

Multi-region writes (multi-master) — an account-level mode where every associated region accepts writes for the same data and ACKs locally; replication is asynchronous.
Write region — a region that locally commits and acknowledges writes; under multi-write, every region is one.
failoverPriority — the contiguous, unique 0…N-1 ordering of regions; decides automatic-failover order, and (only under single-write) which region writes.
Consistency level — the read recency/ordering guarantee on the linear spectrum Strong → Bounded Staleness → Session → Consistent Prefix → Eventual.
Strong consistency — linearizable reads; requires a single global write order and is therefore incompatible with multi-region writes.
Bounded Staleness — reads lag by at most K versions or T seconds (multi-region minimums 100000 / 300); behaves like Strong within a single region.
Session consistency — read-your-writes and monotonic reads/writes within a session token; the Cosmos default and cheapest level.
Session token — x-ms-session-token, the value that scopes Session guarantees; must be flowed across tiers for cross-tier read-your-writes.
Consistent Prefix — reads never see writes out of order, with no recency bound.
Eventual — replicas converge eventually; reads may be out of order; lowest latency, highest availability.
Conflict — two live versions of the same id + partition key meeting during replication; one of insert, replace/update, or delete.
Conflict-resolution policy — the per-container, creation-time, effectively-immutable rule: LWW, Custom-sproc, or Custom-manual.
Last-Writer-Wins (LWW) — auto-resolution where the document with the higher value at a numeric path wins; losers are discarded silently. Default path is _ts.
Conflict-resolution path — the numeric property LWW compares; prefer a monotonic /version over _ts.
Resolver stored procedure — a JavaScript sproc invoked on each conflict with (incomingItem, existingItem, isTombstone, conflictingItems); must be deterministic and idempotent.
Conflicts feed — the per-container queue where unresolved conflicting versions land (Custom-manual, or on sproc failure); your app must drain it.
RPO (Recovery Point Objective) — the data lost on a region failure; non-zero for every multi-write level, bounded only by Bounded Staleness.
RTO (Recovery Time Objective) — time to recover capability; ≈0 for writes under multi-region writes (no promotion step).
ApplicationPreferredRegions — the ordered client-side region list that makes the SDK fail over locally on a regional failure without a redeploy.
Change feed — the ordered log of changes per container; the standard mechanism for migrating to a new container when an immutable property (partition key or conflict policy) must change.

Next steps

You can now configure multi-region writes deliberately, pick a defensible consistency level, and build conflict resolution that survives a region loss. Build outward:

Next: Cosmos DB Partition Key Design & RU Optimization — the modeling decision upstream of everything here; a bad key amplifies every multi-write problem.
Related: Azure Multi-Region Active-Active Architecture — the application-tier patterns that sit above this data layer.
Related: Azure Front Door & Traffic Manager: Global Failover — route users to the nearest healthy region in front of multi-region Cosmos.
Related: High Availability vs Disaster Recovery: RTO & RPO — the RPO/RTO framing that governs your consistency choice.
Related: Multi-Region Data Replication & Consistency Strategies — the general theory of replication and consistency beyond Cosmos.
Related: Azure Monitor & Application Insights for Observability — wire the ReplicationLatency and conflict alerts that keep divergence from hiding.