Most Blob Storage incidents are not ransomware or region outages. They are a service principal with Storage Blob Data Contributor running a del against the wrong prefix, a lifecycle rule that tiered hot data to archive because someone got a filter wrong, or a compliance auditor asking for the WORM evidence that nobody actually enabled. Data protection on Blob Storage is a stack of independent features that each cover a different failure mode, and they interact in ways that bite you if you enable them in the wrong order. This is how to engineer the full stack: tiering for cost, versioning and soft delete for recovery, point-in-time restore for bulk rollback, and immutable storage for compliance — with the prerequisites and the gotchas.
The whole stack assumes a general-purpose v2 or premium block blob account with hierarchical namespace considerations noted where they matter. Most of the recovery features below require the account to be not HNS-enabled (Data Lake Gen2), so check that first if you are on ADLS.
1. Access tiers and the cost/retrieval tradeoff
Blob Storage has four online/offline tiers, and the entire economic model is a trade between storage cost and access cost plus retrieval latency. Get this backwards and you either overpay for cold data or pay rehydration penalties on data you read weekly.
| Tier | Storage cost | Access (read) cost | Min retention | Retrieval latency |
|---|---|---|---|---|
| Hot | Highest | Lowest | None | Milliseconds |
| Cool | Lower | Higher | 30 days | Milliseconds |
| Cold | Lower still | Higher still | 90 days | Milliseconds |
| Archive | Lowest | Highest | 180 days | Hours (rehydrate) |
The rules that actually trip teams up:
- Cool and cold are online. You read them at millisecond latency like hot — you just pay more per read and per-GB transaction. They are for infrequently accessed data, not unreachable data.
- Archive is offline. A blob in archive cannot be read until you rehydrate it back to hot/cool/cold, which takes up to ~15 hours at standard priority (faster at high priority, for a fee). Plan for that latency in any restore runbook.
- Early-deletion charges are real. Move a blob to cool and delete it (or re-tier it) before 30 days and you are billed as if it sat there the full 30. Archive bills the full 180. This is the single most common surprise on a lifecycle bill — aggressive tier-down on short-lived data costs more, not less.
- Tier is set at the blob level. The account “default access tier” only applies to blobs that have never had an explicit tier set.
Rule of thumb: tier down only data you are confident you will not read before the minimum retention elapses. For data you might restore, archive’s rehydration latency means it belongs in your DR plan, not your hot path.
2. Authoring lifecycle management rules
Lifecycle management is a JSON policy on the account that the platform evaluates roughly once per day. It moves or deletes blobs based on age (last modified, last accessed, or creation time) and filters (prefix, blob type, index tags). One policy per account, up to 100 rules.
Here is a production-shaped policy: tier logs down to cool then cold then archive, expire them, and clean up old versions and snapshots independently.
{
"rules": [
{
"enabled": true,
"name": "tier-and-expire-logs",
"type": "Lifecycle",
"definition": {
"filters": {
"blobTypes": [ "blockBlob" ],
"prefixMatch": [ "logs/app/" ]
},
"actions": {
"baseBlob": {
"tierToCool": { "daysAfterModificationGreaterThan": 30 },
"tierToCold": { "daysAfterModificationGreaterThan": 90 },
"tierToArchive": { "daysAfterModificationGreaterThan": 180 },
"delete": { "daysAfterModificationGreaterThan": 2555 }
},
"snapshot": {
"delete": { "daysAfterCreationGreaterThan": 90 }
},
"version": {
"tierToCool": { "daysAfterCreationGreaterThan": 30 },
"delete": { "daysAfterCreationGreaterThan": 365 }
}
}
}
}
]
}
Apply it with the CLI:
az storage account management-policy create \
--account-name kvstgprod \
--resource-group rg-storage-prod \
--policy @lifecycle-policy.json
Key behaviors to internalize:
prefixMatchincludes the container name.logs/app/matches containerlogs, prefixapp/. This is the number-one reason a rule “does nothing” — people omit the container.- Use
daysAfterLastAccessTimeGreaterThanto tier on read activity instead of modification, but you must first enable last-access tracking:az storage account blob-service-properties update --enable-last-access-tracking true .... It adds a small per-transaction cost. - Lifecycle actions on
versionandsnapshotrequire versioning (or snapshots) to exist. Without them those blocks are inert. - The engine is eventually consistent and runs daily. A rule with a 30-day threshold acts up to ~24 hours after day 30, not at the stroke of midnight. Do not build tight SLAs on lifecycle timing.
- Lifecycle
deleteis permanent unless soft delete (Step 4) catches it. Always pair an expiry rule with soft delete.
3. Versioning and change feed: the recovery foundation
Blob versioning automatically creates an immutable, read-only snapshot of a blob every time it is overwritten or deleted, identified by a version ID. This is the bedrock of recovery: with versioning on, an overwrite never destroys the prior content — it just demotes it to a previous version.
az storage account blob-service-properties update \
--account-name kvstgprod \
--resource-group rg-storage-prod \
--enable-versioning true
The change feed is a complementary, ordered, durable transaction log of every create/update/delete in the account, written as Avro into a system container ($blobchangefeed). It is the audit trail and the input to point-in-time restore.
az storage account blob-service-properties update \
--account-name kvstgprod \
--resource-group rg-storage-prod \
--enable-change-feed true \
--change-feed-retention-days 90
What to know:
- Versioning has a cost shape: every version is billed storage. A blob overwritten frequently accumulates versions fast. Always pair versioning with a lifecycle
version.deleterule (Step 2) to cap the tail. - Listing a prior version and promoting it back is a copy operation:
# Find versions of a blob
az storage blob list \
--account-name kvstgprod --container-name app-data \
--prefix config.json --include v \
--query "[].{name:name, versionId:versionId, current:isCurrentVersion}" -o table
# Restore a specific version by copying it over the current blob
az storage blob copy start \
--account-name kvstgprod \
--destination-container app-data --destination-blob config.json \
--source-uri "https://kvstgprod.blob.core.windows.net/app-data/config.json?versionId=2026-06-01T10:15:30.1234567Z"
- Versioning is a prerequisite for point-in-time restore. Change feed is too. Enable both before you can turn PITR on.
4. Soft delete for blobs and containers
Soft delete is the seatbelt. With it enabled, a deleted blob (or an overwritten one, if versioning is off) is retained in a recoverable state for a retention window instead of being purged. There are two independent soft-delete features, and you want both:
# Blob-level soft delete: recovers individual deleted/overwritten blobs
az storage account blob-service-properties update \
--account-name kvstgprod \
--resource-group rg-storage-prod \
--enable-delete-retention true \
--delete-retention-days 14
# Container-level soft delete: recovers an entire deleted container
az storage account blob-service-properties update \
--account-name kvstgprod \
--resource-group rg-storage-prod \
--enable-container-delete-retention true \
--container-delete-retention-days 14
The distinction matters: blob soft delete does not save you if someone deletes the whole container — the blobs go with it. Container soft delete covers that, but only the container as a unit (you cannot restore a single blob from a soft-deleted container; you restore the container, then recover blobs).
Choosing a retention window:
- 7-14 days is the common sweet spot for operational recovery (caught the bad delete within a sprint).
- 30+ days if your detection-to-remediation loop is slow or compliance demands it. Longer windows cost storage for everything deleted in that window.
- Retention is per-account and applies uniformly; you cannot set different windows per container.
Recover an undeleted blob:
az storage blob undelete \
--account-name kvstgprod \
--container-name app-data \
--name important.parquet
Soft delete protects against deletion, not against a malicious actor with permission to change the retention setting. Lock down
Microsoft.Storage/storageAccounts/blobServices/writewith Azure RBAC and a deny assignment or resource lock so the seatbelt cannot be quietly unbuckled.
5. Point-in-time restore (PITR)
PITR restores a set of block blobs (by container/prefix) to their state at a chosen timestamp in the past. It is your “undo the last hour across thousands of objects” button — exactly what you reach for after a bad bulk job. It works by reading the change feed and reverting via versions, which is why both are hard prerequisites.
The dependency chain, enabled in this order:
- Blob versioning on.
- Change feed on.
- Blob soft delete on.
- PITR on, with
restore-daysstrictly less than the soft-delete retention.
az storage account blob-service-properties update \
--account-name kvstgprod \
--resource-group rg-storage-prod \
--enable-restore-policy true \
--restore-days 13
The constraint that fails deployments: restore-days must be less than delete-retention-days. Set soft delete to 14 and PITR to 13, not 14. PITR cannot restore past the soft-delete horizon because the deleted blobs it needs would already be purged.
Run a restore (this reverts the prefix to the state two hours ago):
az storage blob restore \
--account-name kvstgprod \
--resource-group rg-storage-prod \
--time-to-restore "2026-06-08T08:00:00Z" \
--blob-range container-name="app-data" start-blob="orders/" end-blob="orders/zzz"
Operational limits worth knowing:
- PITR restores block blobs only. Append/page blobs, snapshots, metadata-only changes, and container operations are out of scope.
- A restore is a forward operation that overwrites current state for the range — it is destructive to anything written after the restore point. Treat it like a
git reset --hard: scope the range tightly. - PITR is incompatible with HNS (Data Lake Gen2) accounts. If you are on ADLS Gen2, this feature is not available — versioning plus soft delete is your fallback.
- The restore is asynchronous and can take a while for large ranges; poll the operation.
6. Immutable storage: time-based retention and legal hold
Immutability is a different concern from recovery. It is WORM (Write Once, Read Many): once set, data cannot be modified or deleted by anyone — not an admin, not the subscription owner — until the policy releases it. This is what satisfies SEC 17a-4, FINRA, CFTC, and similar regulatory retention mandates.
There are two policy types, and they compose:
- Time-based retention: blobs are immutable for N days from creation/policy time. Deletion is blocked until the interval elapses.
- Legal hold: blobs are immutable indefinitely, tagged with one or more legal-hold tags, until every tag is explicitly cleared. Used for litigation/investigation where you do not know the end date.
Immutability policies are scoped to a container (or, in newer accounts, a version-level policy on individual blobs). Enable version-level immutability support on the account first if you want per-blob control:
# Enable version-level immutability support (account level, one-time)
az storage account blob-service-properties update \
--account-name kvstgprod \
--resource-group rg-storage-prod \
--enable-versioning true
# Container with version-level immutability support enabled
az storage container-rm create \
--storage-account kvstgprod \
--resource-group rg-storage-prod \
--name compliance-archive \
--enable-vlw true
Set an unlocked time-based policy (5 years) on the container so you can test before committing:
az storage container immutability-policy create \
--account-name kvstgprod \
--resource-group rg-storage-prod \
--container-name compliance-archive \
--period 1825 \
--allow-protected-append-writes true
Apply a legal hold:
az storage container legal-hold set \
--account-name kvstgprod \
--resource-group rg-storage-prod \
--container-name compliance-archive \
--tags "litigation-2026-0481"
# Clear it when released:
az storage container legal-hold clear \
--account-name kvstgprod --resource-group rg-storage-prod \
--container-name compliance-archive --tags "litigation-2026-0481"
allow-protected-append-writes is the pragmatic flag for log/append workloads: it lets you keep appending to existing append blobs while still blocking overwrites and deletes of committed data. Without it, even an append is rejected once the policy is on.
7. Locking policies and the implications for deletion
An unlocked time-based policy can be edited or deleted by an authorized user — it is for testing and ramp-up. A locked policy is the real compliance control, and locking is irreversible.
# Lock the policy. This requires the policy's current etag and CANNOT be undone.
ETAG=$(az storage container immutability-policy show \
--account-name kvstgprod --resource-group rg-storage-prod \
--container-name compliance-archive --query etag -o tsv)
az storage container immutability-policy lock \
--account-name kvstgprod --resource-group rg-storage-prod \
--container-name compliance-archive \
--if-match $ETAG
What locking means in practice — read this twice before you run it in production:
- Once locked, you can never shorten the retention period or delete the policy. You can only extend it (a maximum of five times in some configurations). There is no support escalation that undoes a locked policy. If you set 10 years by mistake, that container is immutable for 10 years.
- Blobs under a locked policy cannot be deleted until their individual retention interval elapses. The container itself cannot be deleted while it holds any immutable blob. This is by design — it is the entire point of WORM.
- A locked immutability policy can block lifecycle expiry. A lifecycle
deleteaction and a WORM retention interval are in tension: the delete will fail (or be skipped) for blobs still inside their immutable window. Reconcile your lifecycle expiry thresholds with your retention period so they do not fight. - Deleting the storage account is blocked while a locked policy exists. This is the correct, regulator-friendly behavior, but it surprises teams doing environment teardown — your test subscription cleanup will fail on that one container.
Treat locking like signing a contract. In code review, a PR that locks an immutability policy should require the same scrutiny as one that deletes a database — because it is just as irreversible, in the opposite direction.
8. Validating recovery and estimating cost
Protection you have not exercised is a hypothesis. Two things to actually do.
Validate the recovery paths in a non-prod account that mirrors prod settings:
# 1) Soft delete works: delete then undelete, confirm content matches.
az storage blob delete --account-name kvstgnonprod -c test -n probe.txt
az storage blob undelete --account-name kvstgnonprod -c test -n probe.txt
# 2) Versioning works: overwrite, list versions, restore prior, diff.
az storage blob list --account-name kvstgnonprod -c test \
--prefix probe.txt --include v -o table
# 3) PITR works: write garbage, restore to a pre-garbage timestamp, verify.
az storage blob restore --account-name kvstgnonprod -g rg-nonprod \
--time-to-restore "2026-06-08T08:00:00Z" \
--blob-range container-name="test" start-blob="" end-blob=""
# 4) Immutability works: confirm a delete is REJECTED (this should fail).
az storage blob delete --account-name kvstgnonprod -c compliance-archive -n locked.bin
The fourth check is the important one: a passing immutability test is a failed delete. If that delete succeeds, your WORM control is not actually protecting anything.
Estimate the protection cost before you flip everything on. The cost drivers, in rough order of impact:
- Versions and snapshots: every overwrite of a frequently-changed blob is a new billed copy. High-churn blobs (state files, manifests) can multiply storage 5-10x without a
version.deletelifecycle rule. This is usually the biggest surprise. - Soft-delete retention: everything deleted in the window keeps billing for the window length. A workload that writes-then-deletes a lot of temp data pays for all of it for
delete-retention-days. - Change feed: a transaction log proportional to write volume, retained for
change-feed-retention-days. Cheap relative to data, but non-zero. - Early-deletion penalties: aggressive lifecycle tier-down on short-lived data, as covered in Step 1.
Audit current consumption with a metrics query (Log Analytics / Azure Monitor):
StorageBlobLogs
| where TimeGenerated > ago(7d)
| where OperationName in ("PutBlob", "PutBlock", "DeleteBlob", "CopyBlob")
| summarize Operations = count(), Bytes = sum(RequestBodySize)
by OperationName, bin(TimeGenerated, 1d)
| order by TimeGenerated desc
Use the operation/byte profile to model what versioning and soft delete will add: roughly, extra storage ~ (overwrite + delete volume) x retention/version lifetime. If the number is uncomfortable, tighten the lifecycle version.delete threshold rather than disabling protection.
Verify
Confirm the full stack with read-only commands. This is the post-deploy gate.
# Service-level protection settings in one shot
az storage account blob-service-properties show \
--account-name kvstgprod --resource-group rg-storage-prod \
--query "{ versioning: isVersioningEnabled, \
changeFeed: changeFeed.enabled, \
blobSoftDelete: deleteRetentionPolicy.enabled, \
blobSoftDeleteDays: deleteRetentionPolicy.days, \
containerSoftDelete: containerDeleteRetentionPolicy.enabled, \
pitr: restorePolicy.enabled, \
pitrDays: restorePolicy.days }" -o json
# Lifecycle policy present and enabled
az storage account management-policy show \
--account-name kvstgprod --resource-group rg-storage-prod \
--query "policy.rules[].{name:name, enabled:enabled}" -o table
# Immutability policy state on the compliance container
az storage container immutability-policy show \
--account-name kvstgprod --resource-group rg-storage-prod \
--container-name compliance-archive \
--query "{ state: state, period: immutabilityPeriodSinceCreationInDays }" -o json
Expected results: versioning/change feed/both soft deletes/PITR all true; pitrDays strictly less than blobSoftDeleteDays; the lifecycle rule enabled; the immutability policy state reading Locked (not Unlocked) for anything in production compliance scope.
Enterprise scenario
A capital-markets platform team running trade-confirmation archives on Blob Storage had a clean SEC 17a-4 design: a compliance-archive container with a locked, 7-year time-based immutability policy. Months later their FinOps automation flagged the account: storage was growing ~4% a month with no corresponding business growth. The cause was a lifecycle rule meant to expire confirmations after 7 years — it was firing daily, attempting delete on blobs that were still inside their 7-year immutable window, silently failing on every run, and doing nothing while the data accumulated. Worse, a parallel team had pointed a high-churn manifest writer at the same account with versioning on and no version.delete rule, so every manifest rewrite was minting a permanent version.
The constraint was hard: the immutable policy was locked, so they could not shorten retention or delete anything early — that was non-negotiable and, legally, exactly correct. The fix was twofold. First, they reconciled the lifecycle delete threshold with the WORM period so the expiry rule only targeted blobs past their 7-year interval, ending the daily no-op failures and letting genuinely-expired data clear. Second, they isolated the high-churn manifests into a separate, non-immutable account and added a tight version-cleanup rule. The version rule that stopped the bleed:
{
"rules": [
{
"enabled": true,
"name": "cap-manifest-versions",
"type": "Lifecycle",
"definition": {
"filters": { "blobTypes": [ "blockBlob" ], "prefixMatch": [ "manifests/" ] },
"actions": {
"version": {
"tierToCool": { "daysAfterCreationGreaterThan": 7 },
"delete": { "daysAfterCreationGreaterThan": 30 }
}
}
}
}
]
}
The lessons that stuck: a lifecycle delete and a locked WORM policy will collide, and the collision is silent — the delete just fails and your bill keeps climbing; and immutable compliance data and high-churn operational data do not belong in the same account, because the protection settings you want for one are wrong for the other.