Azure Storage

Azure Storage Redundancy Decoded: LRS vs ZRS vs GRS vs RA-GRS and How to Choose

You create a storage account, the portal asks a question — Redundancy: LRS / ZRS / GRS / RA-GRS / GZRS / RA-GZRS — and most people pick the default and move on. Then one of two things happens. Either the bill arrives twice as large as the team next door for the same data, or a regional incident reveals that the “geo-redundant” account someone trusted as highly available was nothing of the sort: the data was safe in another region, but the app could not read it until somebody manually flipped a switch, and the last few seconds of writes were gone. That single dropdown decides three things at once — your durability (will the bytes survive a disk, a datacentre, a whole region failing), your availability (can the app keep reading and writing through a failure), and a meaningful slice of your bill.

Azure Storage redundancy is simply how many copies of your data exist, where they live, and what happens to your reads and writes when something fails. Every option keeps at least three copies at eleven nines of durability, so “will I lose the bytes to a single disk failure” is never the real question. The real questions are: how big a failure does this survive — a disk, a rack, a whole datacentre, an entire region? Is the protection synchronous (the copy is current) or asynchronous (it lags, so a sudden failure loses recent writes)? And does a copy you can actually read from exist, or is the second copy a cold standby you can only touch after a failover? Six SKUs answer differently, and the names encode the answers once you can read them.

By the end you will read those acronyms like a sentence — the first letters give the primary-region layout, the rest the cross-region story. You will know which failure each SKU survives, why GRS is disaster recovery you invoke rather than high availability you get, what RPO and RTO mean for your recovery promises, and you will have a one-page decision table to pick the right SKU from day one — instead of finding out the hard way during the incident.

What problem this solves

The pain shows up in three flavours. The first is paying for protection you don’t need: a team sets every account to GRS “to be safe,” doubling the bill across hundreds of terabytes of scratch data — CI artifacts, caches, regenerable files — nobody would recover from another region. Pure cost for data whose DR plan is “re-run the job.”

The second, and the dangerous one, is trusting geo-redundancy as high availability. Someone sees “data is replicated to a second region” and concludes the app keeps serving through a regional outage. Not with plain GRS/GZRS: the secondary is not reachable normally, only via a manual account failover that takes time. So during the very outage you bought geo to survive, the app is down until a human decides to fail over and waits for it. The protection was real; the expectation was wrong.

The third is silent data loss on failover. Geo-replication is asynchronous — there is always lag — so an unplanned failover loses writes not yet replicated. That window (the Recovery Point Objective) is usually small but never zero, and a team that assumed “no data loss” gets a surprise reconciliation problem. Knowing this changes the design: idempotent writes, a replayable event log, or a synchronous tier for data you truly cannot lose.

Who hits this: essentially everyone, because nearly every Azure service leans on a storage account — VM disks, Function/App Service packages, logs, Terraform state, registry layers, backups, data-lake files — and all of it inherited a redundancy choice somebody made (or defaulted) at creation. Get the model right once and every call afterward is deliberate.

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should already understand what a storage account is — a named namespace and security-and-billing envelope holding the blob, file, queue and table services (acct.blob.core.windows.net). If that is new, read Azure Storage Account Fundamentals first; this article zooms all the way in on one dropdown it introduces. You should be able to run az in Cloud Shell and read JSON. The substrate underneath every SKU is Azure’s physical geography — availability zones (separate datacentres within one region, the basis of ZRS/GZRS) and region pairs (two linked regions, where the geo copy of GRS/GZRS lands); if those are fuzzy, Azure Regions & Availability Zones Explained is the ground floor, and the recovery vocabulary (RTO/RPO) is covered in Azure Business Continuity & Disaster Recovery: RTO/RPO Fundamentals.

This sits at the foundation of the Storage & Data track. It is a concept/decision article: once you can choose a SKU confidently, the adjacent layers are securing the account (Azure Key Vault: Secrets, Keys & Certificates, Azure Private Endpoint vs Service Endpoint) and, when access breaks, Troubleshooting Azure Storage: 403s, Firewall, Private Endpoint, RBAC & SAS.

Core concepts

Five mental models make every SKU obvious.

Durability is not availability. Durability asks will the bytes survive — and for every SKU the answer is yes to extraordinary degree: at least three copies, eleven nines (99.999999999%) for LRS, more for the others. Availability asks can my app reach the data right now, through this failure? You can have perfect durability and zero availability at once — bytes perfectly safe in a second region your app cannot read. Almost every redundancy misunderstanding collapses these two into one; keep them apart and the rest follows.

Synchronous means current; asynchronous means lagging. A synchronous write is acknowledged only after every copy is durably stored, so all copies are always current — no data-loss window (LRS, ZRS). An asynchronous write is acknowledged as soon as the primary has it, then shipped to the secondary in the background, so the secondary lags — and if the primary is lost suddenly, whatever had not yet shipped is gone. Every cross-region (geo) copy is asynchronous, because synchronously waiting for a datacentre hundreds of kilometres away on every write would cripple latency. This single fact is why geo-redundancy has a non-zero RPO.

The secondary region is dark until you fail over. With GRS or GZRS the second copy exists but your app cannot read or write it normally — the endpoints all point at the primary. You use the secondary only by (a) the RA option, which lights up a read-only endpoint (acct-secondary.blob.core.windows.net); or (b) an account failover, which promotes the secondary to primary. Plain geo is a safe copy you cannot touch until failover; RA adds read access to the lagging copy; neither lets you write without failing over.

Failover is a deliberate, account-wide operation with a clock. An account failover repoints the account so the secondary becomes primary. It is one switch for the whole account (not per-container), something you initiate (Azure does not silently flip it), and it takes time to complete and re-establish geo-replication. So even with GRS, “region down” to “app serving again” includes a human deciding plus the failover completing — that elapsed time is your RTO, which is why geo-redundancy is disaster recovery you invoke, not high availability you get for free.

The name encodes the layout. Read the SKU like a label: Local = three copies in one datacentre, Zone = three across three zones, the middle G adds an asynchronous geo copy in the paired region, RA makes it readable. So RA-GZRS is the maximum on every axis. The next section turns this into a parsing table.

The vocabulary in one place

Every moving part side by side:

Term One-line meaning Why it matters
Durability Will the bytes survive a failure Every SKU is ≥ 11 nines; rarely the deciding factor
Availability Can the app reach the data now What geo-redundancy does not improve by itself
Synchronous All copies current before write is acked LRS, ZRS — no data-loss window
Asynchronous Secondary lags the primary All geo copies — non-zero RPO on unplanned failover
Availability zone A physically separate datacentre in a region What ZRS/GZRS spread copies across
Region pair Two Azure regions linked for geo-replication Where the geo copy lands
Primary region Where your reads/writes go normally The “first letters” of the SKU
Secondary region The geo copy’s location Dark unless RA (read) or failover (read+write)
RA (Read-Access) Read-only secondary endpoint enabled Read the lagging copy without failing over
Account failover Promote secondary to primary Manual, account-wide; defines storage RTO
RPO Max data you might lose Non-zero for geo (async lag)
RTO How long until you’re serving again Failover decision + completion time

Reading the SKU names

Every redundancy SKU is built from a small grammar. Parse it once and the six codes stop being opaque.

Token in the name What it tells you Example
LRS — Locally-redundant 3 synchronous copies in one datacentre Standard_LRS
ZRS — Zone-redundant 3 synchronous copies across 3 availability zones Standard_ZRS
GRS — Geo-redundant LRS in primary + async LRS in the paired region Standard_GRS
GZRS — Geo-zone-redundant ZRS in primary + async LRS in the paired region Standard_GZRS
RA- prefix The geo secondary has a read-only endpoint Standard_RAGRS, Standard_RAGZRS

Two reading rules make it click: the part before any “G” is your primary-region story (L = one datacentre, Z = three zones), and a “G” adds an asynchronous copy in the paired region, “RA” makes it readable. The family then lays out from “cheapest, smallest blast radius” to “most expensive, widest”:

SKU (API name) Primary layout Geo copy? Secondary readable? Survives up to
Standard_LRS 3 copies, 1 datacentre No A disk / server / rack failure
Standard_ZRS 3 copies, 3 zones No No (synchronous, no secondary region) A full datacentre / zone outage
Standard_GRS 3 copies, 1 datacentre Yes (async) No A region-wide disaster (after failover)
Standard_RAGRS 3 copies, 1 datacentre Yes (async) Yes (read-only) A region disaster; read during outage
Standard_GZRS 3 copies, 3 zones Yes (async) No A zone outage and a region disaster
Standard_RAGZRS 3 copies, 3 zones Yes (async) Yes (read-only) Zone + region; read during outage

Notice what the names do not promise: no geo SKU makes the secondary writable without a failover, and none make replication synchronous — the geo copy always lags. And ZRS is the only single-letter-primary SKU that survives a whole datacentre loss, because its copies are already in three buildings. Internalise this table and you have 80% of the topic.

What each SKU actually protects against

Durability is a given; the differentiator is blast radius — how large a failure the SKU rides through without you losing access. Walking up the ladder makes the trade-offs visible.

Failure event LRS ZRS GRS / RA-GRS GZRS / RA-GZRS
Single disk / drive failure Survives Survives Survives Survives
Server / rack / node failure Survives Survives Survives Survives
Whole datacentre / zone outage Lost (all 3 copies there) Survives (other 2 zones) Lost in primary (geo copy intact) Survives (other zones)
Region-wide disaster Lost Lost (one region only) Survives after failover Survives after failover
Accidental delete / overwrite / ransomware Not covered Not covered Not covered Not covered

Two truths jump out. ZRS is the cheapest way to survive losing an entire datacentre — its copies already span three, and a zone outage is far more common than a whole region going dark. And no SKU protects you from yourself: a deleted blob, an overwritten file, a ransomware-encrypted container replicates faithfully to every copy, including the geo secondary. Redundancy answers “what if the infrastructure fails,” never “what if I corrupt the data” — that is the job of soft delete, versioning, immutability and backups.

For when to pick which, the decision table below maps requirements straight to SKUs.

HA vs DR: why geo-redundancy is not high availability

This is the single most expensive misunderstanding in the topic. High availability (HA) means the app keeps serving through a failure automatically, with little or no human action. Disaster recovery (DR) means you can recover after a failure, with a deliberate procedure and a planned recovery time. ZRS delivers HA within a region — a zone dies and reads/writes continue with no action. Geo-redundancy (GRS/GZRS) delivers DR across regions — the data is safe in the pair, but bringing the app back means initiating a failover and waiting for it to complete.

Property ZRS (zone HA) GRS / GZRS (geo DR) RA-GRS / RA-GZRS
Protects against Zone/datacentre outage Region disaster Region disaster
Replication to the protective copy Synchronous Asynchronous Asynchronous
Data-loss window (RPO) Zero Non-zero (async lag) Non-zero (async lag)
App keeps serving automatically? Yes (other zones) No — needs failover Reads from secondary, writes need failover
Human action required at failure None Initiate account failover Initiate failover for writes
Recovery time (RTO) ~Immediate Failover decision + completion Same for writes; reads immediate
What it is High availability Disaster recovery DR + read scale/standby

The design implications are immediate. “Stay up through a datacentre failure with no data loss, no manual step” is ZRS — adding GRS does nothing for availability, only a DR copy. “Survive an entire region loss” needs a geo SKU plus a failover runbook plus a small RPO. And true “stay up through a regional outage automatically” is not something storage redundancy delivers — that needs a multi-region application architecture on top; see Azure Multi-Region Active-Active Design. The SKU protects the data; keeping the application serving across regions is your design above it.

RPO and RTO for storage, concretely

Two acronyms govern every DR conversation. RPO (Recovery Point Objective) is the maximum data, in time, you can lose — “at most the last N minutes of writes.” RTO (Recovery Time Objective) is the maximum time you can be down — “serving again within N minutes/hours.” Mapped to storage SKUs:

Scenario RPO (data loss) RTO (time to recover) What drives it
LRS, datacentre lost Total (no surviving copy) Until you restore from backup No redundancy survives the event
ZRS, zone lost Zero ~Immediate (other zones serve) Synchronous, automatic
GRS/GZRS, region lost, planned failover Small (last replicated point) Failover completion time Async lag + failover duration
GRS/GZRS, region lost, unplanned failover Non-zero — recent unreplicated writes lost Decision + failover completion Async lag is the loss window
RA-GRS/RA-GZRS, region lost, reads n/a for reads (stale-tolerant) ~Immediate for reads Secondary endpoint already live

The crucial line is the unplanned-failover row: because geo-replication is asynchronous, when the primary region is genuinely gone you fail over to whatever had already replicated — writes still in flight are lost. The Last Sync Time shows how far behind the secondary is; everything after it may be gone. This is why “we paid for GRS so we won’t lose data” is wrong, and why critical systems pair geo with idempotent writes, a replayable event log, or a secondary write path.

The decision table — pick a SKU in one read

Match your requirement in the left column to the SKU on the right. This is the one-pager to keep:

If your requirement is… …then choose Why
Data is reproducible / scratch / derived LRS Cheapest; DR plan is “re-run the job”
Data must not leave one region (residency) LRS or ZRS No geo copy crosses to the paired region
Stay up through a datacentre/zone loss, zero data loss, no manual step ZRS Synchronous across 3 zones = in-region HA
Survive a full regional disaster GRS / GZRS Async geo copy in the paired region
Regional DR and automatic zone survival GZRS ZRS primary + geo; best for critical data
Need to read during a primary-region read outage RA-GRS / RA-GZRS Read-only secondary endpoint is live
App must stay serving across a whole region automatically (storage SKU alone won’t do it) Needs multi-region app design above storage

Architecture at a glance

Picture the data flowing left to right through the resilience tiers. Your application writes through the account’s public endpoint (acct.blob.core.windows.net, HTTPS/443) to the primary region, where the layout depends on the SKU’s first letters: LRS-style keeps the three copies in a single datacentre; ZRS-style (ZRS, GZRS) spreads them across three availability zones, so a whole datacentre can drop and writes continue synchronously from the surviving zones — the high-availability half. Every write is acknowledged the instant it is durably stored, keeping latency low.

Then, only if the SKU carries a G, that write is shipped asynchronously across the region pair to the secondary region, landing as a locally-redundant copy hundreds of kilometres away — the disaster-recovery half, and the lag on that hop is your RPO. The secondary stays dark unless you chose an RA SKU (read-only endpoint acct-secondary.blob.core.windows.net); making it writable requires an account failover whose elapsed time is your RTO. The badges mark where this bites: the zone boundary (what ZRS saves you from), the async lag (data lost on unplanned failover), and the failover switch (time to recover).

Left-to-right Azure Storage redundancy architecture: an application writes over HTTPS to a primary region whose three synchronous copies span three availability zones (zone-level HA), an asynchronous geo-replication hop crosses the region pair to a secondary region holding a locally-redundant copy, a read-only secondary endpoint is available only on RA SKUs, and a manual account-failover switch promotes the secondary to primary, with numbered badges marking the zone-loss boundary, the async replication lag (RPO), and the failover operation (RTO)

Real-world scenario

ShopVeda, a mid-size Indian e-commerce company, ran its entire platform on one Standard_GRS storage account in Central India — product images, the order-event journal, and nightly database exports. Someone had set everything to GRS years ago “for safety” and nobody had revisited it. The bill was uncomfortable (geo roughly doubles the storage charge, and they were replicating everything, including 40 TB of regenerable image thumbnails) — but the real reckoning came during a regional incident.

When Central India had a multi-hour service degradation, the on-call runbook said only: “Storage is GRS, data is safe in the paired region.” True — and useless in the moment. Checkout was returning 503s because it could not write order events, and the engineer assumed GRS meant the app would “just read from the other region.” It did not: GRS has no readable secondary, so there was no path to the data without an account failover — never rehearsed, risky to trigger during a partial outage, and one that would have cost time and possibly the most recent order writes (async). They rode it out with checkout down, because failing over felt riskier than waiting.

The post-incident redesign re-tiered by data class. The order-event journal — irreplaceable and must-stay-serving — moved to Standard_GZRS (ZRS primary so a zone failure is survived automatically with zero data loss, plus geo for regional DR), with an app-level secondary write path so a failover would not lose in-flight orders. The product images moved to Standard_RAGZRS so the catalogue keeps rendering from the read-only secondary during a primary-region read outage (stale images for minutes are harmless). And the 40 TB of regenerable thumbnails dropped to Standard_LRS — no geo, because the DR plan for derived data is “re-run the resize job.” That cut the storage bill by roughly a third while improving resilience for the data that mattered, and replaced a one-line runbook with a rehearsed failover and a documented RPO/RTO per data class. The lesson at the top of the new runbook: redundancy is a per-data-class decision, not an account-wide default, and GRS is a recovery plan you practise — not availability you assume.

Advantages and disadvantages

There is no single “best” SKU; each trades cost, blast radius, recovery latency and complexity. The honest two-column view:

Advantages Disadvantages
LRS Cheapest; synchronous (no data loss window); simplest; satisfies single-region residency Zero protection against datacentre/zone or region loss
ZRS Survives a full datacentre/zone outage automatically with zero data loss; true in-region HA; modest premium Only in zone-enabled regions; no protection from regional disaster
GRS Survives a regional disaster; eleven extra nines of cross-region durability Secondary is dark (no read); async = non-zero RPO; failover is manual + takes time; ~2× cost
RA-GRS All of GRS plus readable secondary for stale-tolerant reads / standby Reads may be stale; writes still need failover; cost of GRS + read transactions
GZRS Zone HA and regional DR in one SKU; best resilience for critical data Highest cost; zone-region-only; geo half still async + manual failover
RA-GZRS Maximum: zone HA + regional DR + readable secondary Highest cost of all; same async/failover caveats on the geo half

When does each matter? LRS when the data is reproducible or residency-locked — geo there is pure waste. ZRS for the broad middle of production data: the most resilience-per-rupee, surviving a zone outage automatically. GRS/GZRS when a regulator or board demands survival of a regional catastrophe, or the data is irreplaceable — you accept the cost, the RPO and a failover plan. The RA variants matter only when you have staleness-tolerant read traffic to serve during a primary-region read outage, or want a queryable copy for verification — not as a substitute for HA on the write path.

Hands-on lab

This lab creates a tiny empty account, changes its redundancy, reads its properties, and deletes it — the charge for a few minutes is negligible. You need the Azure CLI signed in (az login). Use your own globally-unique account name.

1. Create a resource group and an LRS account — the floor.

az group create --name rg-redundancy-lab --location centralindia

az storage account create \
  --name stredundlab$RANDOM \
  --resource-group rg-redundancy-lab \
  --location centralindia \
  --sku Standard_LRS \
  --kind StorageV2 \
  --https-only true \
  --min-tls-version TLS1_2

Expected: JSON describing the account with "sku": { "name": "Standard_LRS", "tier": "Standard" }.

2. Read the current redundancy. Note there is no secondary endpoint yet.

ACCT=$(az storage account list -g rg-redundancy-lab --query "[0].name" -o tsv)

az storage account show -n "$ACCT" -g rg-redundancy-lab \
  --query "{sku:sku.name, primary:primaryEndpoints.blob, secondary:secondaryEndpoints.blob, primaryStatus:statusOfPrimary}" -o json

Expected: secondary is null — LRS has no geo secondary.

3. Change redundancy live. LRS↔ZRS and LRS↔GRS are live conversions on a Standard v2 account. Upgrade to Standard_RAGRS to light up a readable secondary:

az storage account update -n "$ACCT" -g rg-redundancy-lab --sku Standard_RAGRS

# Re-read: a secondary endpoint now exists
az storage account show -n "$ACCT" -g rg-redundancy-lab \
  --query "{sku:sku.name, secondary:secondaryEndpoints.blob, secondaryStatus:statusOfSecondary}" -o json

Expected: sku is now Standard_RAGRS and secondary shows an ...-secondary.blob.core.windows.net URL.

4. Inspect geo-replication health — the Last Sync Time. This is how far behind the secondary is; everything written after it is at risk on an unplanned failover.

az storage account show -n "$ACCT" -g rg-redundancy-lab \
  --expand geoReplicationStats \
  --query "{status:geoReplicationStats.status, lastSyncTime:geoReplicationStats.lastSyncTime, canFailover:geoReplicationStats.canFailover}" -o json

Expected: status is Live once replication completes (briefly Bootstrap right after enabling geo); lastSyncTime is a recent UTC timestamp.

5. (Read-only) The failover command — do not run it casually. Account failover promotes the secondary; on a real account it has consequences (see Common mistakes). For reference:

# DR drills only. After failover the account becomes LRS in the new primary
# until you re-enable geo. Initiating it needs the data to be in sync.
az storage account failover --name "$ACCT" --resource-group rg-redundancy-lab

6. Equivalent Bicep — how the redundancy choice should live in source control, not as a portal click:

@description('Globally unique storage account name')
param storageAccountName string
param location string = resourceGroup().location

resource sa 'Microsoft.Storage/storageAccounts@2023-05-01' = {
  name: storageAccountName
  location: location
  sku: {
    name: 'Standard_GZRS'   // zone HA in primary + async geo to the pair
  }
  kind: 'StorageV2'
  properties: {
    minimumTlsVersion: 'TLS1_2'
    supportsHttpsTrafficOnly: true
    allowBlobPublicAccess: false
  }
}

output primaryBlob string = sa.properties.primaryEndpoints.blob

7. Tear down so nothing lingers on the bill:

az group delete --name rg-redundancy-lab --yes --no-wait

Common mistakes & troubleshooting

The model sticks once you have seen how it bites — symptom, root cause, how to confirm, and fix.

# Symptom Root cause How to confirm Fix
1 App down during a regional incident despite GRS GRS secondary is not readable; needs failover secondaryEndpoints is null on a plain GRS account Use RA-GRS/RA-GZRS for read access, and have a rehearsed failover runbook
2 Recent writes missing after a failover Geo-replication is async; in-flight writes lost Compare write times to geoReplicationStats.lastSyncTime Idempotent writes + replayable event log; accept RPO or add a synchronous write path
3 Storage bill doubled for no clear reason Geo SKU replicating regenerable data az storage account show --query sku.name shows *GRS/*GZRS Re-tier scratch/derived data to Standard_LRS
4 “ZRS not allowed in this region” error on create The region has no availability zones Check region zone support before choosing ZRS/GZRS Use a zone-enabled region, or LRS/GRS if zones are unavailable
5 Cannot change Premium account from LRS to GRS Premium tiers have limited geo options kind/sku.tier shows Premium; geo not offered Use object replication or app-level copy; or Standard for geo needs
6 az storage account failover refused Secondary not in sync / canFailover false geoReplicationStats.canFailover is false Wait for Live + in-sync; check lastSyncTime advancing
7 Reads from -secondary endpoint return 404 for new data Secondary lags; object not replicated yet New blob not yet at lastSyncTime Read primary for fresh data; treat secondary as stale-tolerant
8 Deleted blob is gone from every copy including geo Redundancy replicates deletes faithfully The delete propagated to the secondary too Enable soft delete + versioning; redundancy ≠ backup
9 After failover the account is now LRS Failover leaves the new primary as LRS until re-geo sku.name reads Standard_LRS post-failover Re-enable geo (--sku *GRS) once the new primary is stable
10 Changing GRS→ZRS seems to do nothing / errors Some conversions need a migration, not a flag The requested change isn’t a supported live conversion Use a supported path (often via an intermediate SKU) or a planned migration / object replication

The two that cost teams most are #1 and #2: the HA-vs-DR confusion (“geo-redundant” did not mean “stays up”) and the async-lag surprise (assuming zero data loss with no reconciliation plan). #8 is the other classic — redundancy is not backup: no SKU undoes a mistaken delete or ransomware encryption; that is what soft delete, versioning, immutability and point-in-time restore are for.

Best practices

Security notes

Encryption is on for every SKU and every copy — at rest by default with platform-managed keys (or customer-managed keys in Key Vault) — and it follows the data to the geo secondary, so a replicated copy is never less protected than the primary. See Azure Key Vault: Secrets, Keys & Certificates for customer-managed keys.

The subtle consideration is data residency and sovereignty. The instant you choose a geo SKU, a full copy lives in the paired region — possibly a different state or country — and if a rule forbids data leaving a jurisdiction, a geo SKU silently violates it (replication is automatic). There, LRS or ZRS are the compliant choice and you achieve DR differently. Always confirm the paired region before enabling geo.

The read-only secondary endpoint on RA SKUs is a second public surface for the same data — lock it down with the same Entra RBAC, firewall rules and private networking as the primary (see Azure Private Endpoint vs Service Endpoint). And because account keys and SAS are valid against the secondary too, the discipline in Troubleshooting Azure Storage: 403s, Firewall, Private Endpoint, RBAC & SAS applies to both endpoints.

Cost & sizing

Redundancy is one of the biggest multipliers on a storage bill, because it changes how many copies of every gigabyte you store. The rough relationship (prices vary by region/tier/meter — treat as orders of magnitude):

SKU Relative storage cost What you’re paying for When it’s worth it
Standard_LRS 1.0× (baseline) 3 local copies Reproducible/residency-locked data
Standard_ZRS ~1.25× 3 copies across 3 zones Default production single-region data
Standard_GRS ~2× LRS + async geo copy Must survive a regional disaster
Standard_RAGRS ~2× + read transactions GRS + readable secondary Geo DR plus stale-tolerant reads
Standard_GZRS ~2.5× ZRS + async geo copy Critical data: zone HA + regional DR
Standard_RAGZRS ~2.5× + read transactions GZRS + readable secondary Maximum resilience + secondary reads

Three things drive the bill: capacity stored (× the redundancy factor), transactions (RA secondary reads billed on top), and geo-replication data transfer (a per-GB charge proportional to write volume — chatty writers pay more for geo than archival ones). To right-size: drop geo on regenerable data to LRS, default the middle to ZRS, reserve GRS/GZRS for data that genuinely needs regional survival. Moving 40 TB from Standard_GRS (~2×) to Standard_LRS (1×) halves its storage charge — tens of thousands of rupees a month for zero loss of real resilience. There is no free tier for geo-redundancy; you pay less only by not replicating data that doesn’t need it.

Interview & exam questions

Q1. What does each letter in RA-GZRS mean? RA = Read-Access (readable secondary), GZ = geo-zone-redundant: zone-redundant in the primary, asynchronously geo-replicated to the paired region, with a read-only secondary endpoint. The maximum-resilience SKU. (AZ-900, AZ-104)

Q2. Is GRS high availability? No — GRS is disaster recovery. The geo secondary is asynchronous and not readable normally; using it requires a manual account failover that takes time and may lose recent writes. HA within a region comes from ZRS. (AZ-104)

Q3. Which SKU survives a full datacentre outage with zero data loss and no manual action? ZRS (and the ZRS-based GZRS/RA-GZRS): its three copies span three availability zones synchronously, so the surviving zones keep serving automatically. (AZ-900, AZ-104)

Q4. Why is the RPO non-zero for GRS/GZRS? Because cross-region geo-replication is asynchronous — writes are acknowledged once the primary has them and shipped to the secondary in the background. On an unplanned failover, writes after the Last Sync Time are lost, so the recovery point sits behind “now.” (AZ-104)

Q5. The difference between RA-GRS and GRS? Only the readable secondary: RA-GRS exposes acct-secondary.blob.core.windows.net for read-only access to the lagging copy any time, whereas GRS’s geo copy is dark until a failover. Neither is writable without failover. (AZ-104)

Q6. A regulator forbids data leaving the country. Which SKUs are safe? LRS and ZRS — all copies stay within the chosen region. Any geo SKU replicates a full copy to the paired region, which may be outside the jurisdiction. (AZ-104, AZ-500)

Q7. Does any redundancy SKU protect against an accidental delete? No — deletes, overwrites and ransomware encryption replicate faithfully to every copy, including the geo secondary. Protection against self-inflicted loss comes from soft delete, blob versioning, immutability and point-in-time restore. (AZ-104, AZ-500)

Q8. After an account failover, what redundancy is the account? It becomes LRS in the new primary until you re-enable geo-redundancy — re-establish the geo SKU once the new primary is stable to restore cross-region protection. (AZ-104)

Q9. Why prefer GZRS over GRS for critical workloads? GZRS has a zone-redundant primary, so a zone/datacentre outage is survived automatically with zero data loss and no failover, while still providing geo DR. Plain GRS has only LRS in the primary, so a zone loss there takes the primary down. (AZ-104)

Q10. What is “Last Sync Time” and why does it matter? It is the timestamp up to which the primary’s data is confirmed replicated to the secondary — the lag. Data written after it may be lost on an unplanned failover, so you check it before failing over to gauge data-loss exposure. (AZ-104)

Q11. Can you change LRS to GRS without downtime or data migration? Yes — on a Standard general-purpose v2 account, LRS↔GRS (and LRS↔ZRS) are live conversions you trigger by updating the SKU. Some other conversions (and Premium accounts) require a planned migration or object replication instead. (AZ-104)

Q12. ZRS vs GZRS — when does the extra cost pay off? When you need both in-region zone HA and survival of a full regional disaster. If only “survive a zone outage” is required, ZRS is sufficient and cheaper. (AZ-104)

Quick check

  1. Read the SKU Standard_RAGRS: where do the copies live and which are readable?
  2. Your app must keep serving reads and writes automatically through a single datacentre failure, with no manual step and no data loss. Which SKU?
  3. True or false: GRS keeps your application available during a regional outage with no action required.
  4. Why can an unplanned failover from a GRS account lose data?
  5. A storage account holds 30 TB of nightly-regenerated derived files. What redundancy is appropriate, and why?

Answers

  1. Standard_RAGRS — three synchronous copies in the primary datacentre plus an asynchronous LRS copy in the paired region, and RA exposes that geo copy through a read-only secondary endpoint (lagging, read-only).
  2. ZRS (or GZRS if you also need regional DR). ZRS spreads three synchronous copies across three availability zones, so losing one zone leaves the others serving automatically with zero data loss — in-region HA. GRS would not satisfy this; its secondary needs a manual failover.
  3. False. GRS is disaster recovery, not HA — its secondary is asynchronous and unreachable without a manual account failover that takes time and may lose recent writes. In-region automatic availability comes from ZRS.
  4. Because geo-replication is asynchronous: writes are acknowledged once the primary has them and shipped to the secondary in the background, so anything written after the Last Sync Time is lost if the primary is lost suddenly.
  5. Standard_LRS. The data is reproducible — the DR plan is “re-run the job” — so paying ~2× for geo (or even the ZRS premium) is wasted. LRS gives eleven nines at the lowest cost, all regenerable data needs.

Glossary

Next steps

AzureStorage AccountRedundancyLRS GRS ZRSDisaster RecoveryRPO RTOHigh AvailabilityGZRS
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading