Every stored credential is a liability with a half-life: secrets expire at the worst moment, leak into logs and .env files, and outlive the engineer who created them. This guide walks the full path to a secret-free estate — Azure Key Vault as the system of record for the few secrets you cannot avoid, managed identities for anything running inside Azure, and workload identity federation (OIDC) to extend that passwordless model to GitHub Actions and AKS. The destination is an estate where the only thing you rotate is trust, not strings.
The reason this is hard is not the vault — creating a vault takes one command. The reason is the bootstrap: to read a secret you must first authenticate, and if that authentication is itself a stored secret you have moved the problem one hop upstream, not solved it. The entire discipline in this article is closing that last gap so that no stored credential anywhere grants access to your secrets. You will learn the exact trust assertions (issuer, subject, audience), the RBAC roles that gate the data plane, the federation subjects for each platform, and — because this is operational — the precise az and portal paths to confirm why a passwordless sign-in failed, since the failure modes are subtle and the error messages are deliberately vague.
By the end you will be able to stand up Key Vault with the right authorization and network posture, attach the right flavour of managed identity to each Azure workload, federate GitHub Actions and AKS service accounts to Entra ID with no stored secret, rotate secrets with zero downtime, and prove the whole thing is secret-free with Resource Graph and audit logs. Because you will return to this mid-incident, the federation subjects, the RBAC roles, the error codes, and the failure playbook are all laid out as scannable tables — read the prose once, then keep the tables open when a deploy fails at 02:00 with AADSTS70021.
What problem this solves
Secrets do not fail loudly. They fail at 02:00 on a Saturday when a certificate expires, or six months after an engineer leaves and their personal access token is finally revoked, or the day a .env file lands in a public repo. The pain in production terms is fourfold: expiry (a rotated database password that nobody propagated takes the app down), leakage (a secret in CI logs, an image layer, a Slack message), sprawl (the same credential copied into twelve app settings, none of which you can find when you must rotate), and attribution loss (a shared service-principal secret used by forty pipelines, so a breach implicates all of them and the audit trail names one principal for every action).
What breaks without this: teams hand-roll secret rotation and it desynchronises; they store an AZURE_CREDENTIALS JSON blob in every GitHub repo and can never rotate it without coordinating forty pipelines; they grant a runtime workload Key Vault Contributor (a control-plane role) and accidentally let it grant itself more access. The instinct — “we have a vault, we are secure” — is the trap. A vault you authenticate to with a stored secret is a vault with a key under the doormat.
Who hits this: every team running workloads that need credentials — which is every team. It bites hardest on CI/CD pipelines (the long-lived deploy credential is the single most over-privileged, most-copied secret in most estates), AKS workloads (pod-managed identity is deprecated and the migration is non-obvious), and multi-repo platforms (the 20-federated-credential ceiling arrives fast when you model identity per repository). The fix is almost never “add another secret to the vault” — it is “make the platform vouch for the workload so there is no secret to store.”
To frame the whole field before the deep dive, here is every identity-bootstrap mechanism this article covers, where the trust originates, and the one failure that defines it:
| Mechanism | Where identity originates | Use it for | Defining failure mode |
|---|---|---|---|
| System-assigned managed identity | Azure platform, bound 1:1 to one resource | A standalone Azure service whose identity should die with it | Identity vanishes on resource delete; orphaned role assignments |
| User-assigned managed identity (UAMI) | Azure platform, standalone resource | Workload families that share access; survives blue/green | Forget to attach it → workload falls back to no identity |
| Workload identity federation (FIC) | External OIDC issuer (GitHub, AKS, GitLab) | Workloads outside Azure’s IMDS reach | Subject string drift → AADSTS70021 no matching FIC |
| Key Vault reference | App setting resolved by a managed identity | Injecting a vault secret into app config without code | Identity lacks Secrets User → resolves to empty → crash loop |
| CSI Secrets Store | UAMI brokered into a pod via a webhook | Mounting vault secrets as files in AKS | Missing pod label → no token → mount fails |
Learning objectives
By the end of this article you can:
- Explain the secret-zero problem and how platform-issued identity (managed identity and federation) eliminates the bootstrap credential entirely.
- Stand up Key Vault with the correct authorization model (RBAC over access policies), soft-delete, purge protection, and private networking — and justify each choice.
- Choose between system-assigned and user-assigned managed identities per workload, and assign least-privilege Key Vault data-plane roles at the right scope.
- Configure a federated identity credential (issuer / subject / audience) and federate GitHub Actions and AKS service accounts to Entra ID with no stored secret.
- Use Key Vault references and the Secrets Store CSI driver so application code never handles a secret string, and rotate without redeploying.
- Implement zero-downtime rotation with versionless references, CSI polling, and Event Grid notifications — treating rotation as a vault-side event consumers observe.
- Audit the estate for orphaned secrets with Resource Graph, route Key Vault
AuditEventlogs to Log Analytics, and alert on anomalous access. - Diagnose a failed passwordless sign-in to a specific cause — wrong subject, missing role, vault firewall, IMDS unreachable, 20-FIC ceiling — using exact commands.
Prerequisites & where this fits
You should already understand Entra ID basics: a tenant, an app registration (the identity of an application), a service principal (the local instance of that app in a tenant), and Azure RBAC (role assignments on a scope). You should be comfortable running az in Cloud Shell, reading JSON output, and reading a Bicep resource. Familiarity with OIDC at the level of “an issuer mints a signed token with claims, a relying party validates it” is assumed; you do not need to know JOSE internals.
This sits at the centre of the Identity & Platform Security track. Upstream of it is Azure Entra ID Fundamentals: Tenants, Users, Groups & RBAC, which defines the principals you assign roles to, and Entra Managed Identities Deep Dive: User-Assigned, FIC & RBAC, which goes deeper on the identity objects themselves. It pairs tightly with Azure Key Vault: Secrets, Keys & Certificates (the data-plane objects you are protecting) and Azure Key Vault Secret Rotation with Managed Identity. The federation half generalises across clouds — see GitHub Actions OIDC: Keyless Deploys to Multi-Cloud and Workload Identity Federation for Secretless CI/CD.
A quick map of who owns each layer, so you escalate to the right team when a passwordless flow breaks:
| Layer | What lives here | Who usually owns it | Failure classes it causes |
|---|---|---|---|
| External OIDC issuer | GitHub/AKS token endpoint, sub claim shape |
Platform / DevOps | AADSTS70021 (no matching FIC), subject drift |
| Entra ID (FIC + app/UAMI) | Trust assertions, app registration, role grants | Identity team | AADSTS700213, AADSTS50034, missing role |
| Key Vault control plane | Vault config, networking, RBAC model | Platform team | Privilege escalation via access policies |
| Key Vault data plane | Secret values, versions, rotation | Secret-ops + app | Forbidden (no Secrets User), empty KV reference |
| Network path | Private endpoint, vault firewall, DNS | Network team | Resolution to public IP, firewall block, timeout |
| Workload runtime | IMDS / projected SA token, SDK credential | App / dev team | DefaultAzureCredential chain failures |
Core concepts
Five mental models make every later step obvious.
Secret-zero is the only hard part. To read a secret from Key Vault, a workload must authenticate to Entra ID. If that authentication relies on a stored client secret, you have only moved the problem one hop upstream. The answer is platform-issued identity: the platform a workload runs on (an Azure VM, an AKS pod, a GitHub runner) issues it a short-lived token, and Entra ID is configured to trust that platform. No secret is stored anywhere. Everything in this article is a variation on that single idea.
Managed identity is “Azure trusts itself”; federation is “Entra trusts a named external subject.” Inside Azure, the platform mints and rotates an identity bound to a resource and exposes it via IMDS (the Instance Metadata Service at 169.254.169.254). Outside Azure, an external OIDC issuer mints a token and Entra ID validates it against a configured federated identity credential (FIC). Both paths end in a normal short-lived Entra access token and zero stored secrets. The fork is purely “is the workload inside Azure’s IMDS reach?”
A FIC is a three-field trust assertion, matched exactly. A federated identity credential says: I will accept a token from this issuer, identifying this subject, for this audience. All three must match the incoming token exactly — subjects are case- and string-sensitive. Issuer is the OIDC issuer URL; subject is the sub claim (a repo+environment, or a Kubernetes service account); audience for Entra is always api://AzureADTokenExchange. Get one character wrong and Entra returns “no matching federated identity credential,” not “access denied” — a distinction that wastes hours if you do not know it.
Authorization has two planes, and confusing them is the classic mistake. The control plane (manage the vault: create it, set networking, assign roles) is governed by Azure RBAC roles like Key Vault Contributor. The data plane (read/write secret values) is governed either by legacy access policies or by Azure RBAC data roles like Key Vault Secrets User. A runtime workload needs data-plane read and nothing else; giving it Contributor lets it grant itself more — a privilege-escalation path that RBAC-for-data-plane closes.
Rotation is a vault-side event consumers observe, never a coordinated deploy. The discipline is: store each secret in exactly one place (the vault), reference it versionlessly everywhere, and let resolvers follow the current version. A versioned URI or a hardcoded value anywhere reintroduces a rotation outage. Done right, rotating a secret is one operation in the vault; every consumer picks it up on its own refresh cadence.
The vocabulary in one table
Before the deep sections, pin down every moving part. The glossary repeats these for lookup; this table is the model side by side:
| Concept | One-line definition | Where it lives | Why it matters here |
|---|---|---|---|
| Key Vault | Managed store for secrets, keys, certs | Resource group | The system of record for unavoidable secrets |
| Secret-zero | The bootstrap credential you must not have | (nowhere, ideally) | The whole problem this article solves |
| Managed identity | Platform-minted identity for an Azure resource | Entra + resource | Passwordless auth inside Azure |
| UAMI | Standalone, reusable managed identity | Its own resource | Shared access across a workload family |
| IMDS | Metadata endpoint that issues the token | 169.254.169.254 |
Where in-Azure workloads get their token |
| FIC | Federated identity credential (trust assertion) | On an app or UAMI | Lets Entra trust an external OIDC subject |
| Issuer / subject / audience | The three fields a FIC matches | In the FIC + token | All must match exactly or sign-in fails |
| Access policy | Legacy flat data-plane permission list | On the vault | The escalation-prone model to avoid |
| RBAC data role | Secrets User/Officer/Administrator |
Role assignment | The recommended least-privilege model |
| Key Vault reference | @Microsoft.KeyVault(SecretUri=…) |
App setting | Injects a secret without code seeing it |
| CSI Secrets Store | Mounts vault secrets as files in a pod | AKS add-on | Workload-identity-mode secret mounting |
| Versionless URI | SecretUri ending in / (no version) |
Reference / config | The foundation of zero-downtime rotation |
The authorization & error reference
Before the per-step detail, here is the lookup table you scan first when a passwordless flow fails: every error you realistically see across Key Vault, managed identity, and federation, what it means, the likely cause, how to confirm it, and the fix. The non-obvious ones are the AADSTS codes (Entra token-exchange failures) and the difference between a control-plane 403 and a data-plane Forbidden.
| Code / error | Where it surfaces | Likely cause | How to confirm | First fix |
|---|---|---|---|---|
AADSTS70021 No matching federated identity record |
azure/login, token exchange |
Token sub does not match any FIC subject |
Compare workflow environment/ref to FIC subject |
Fix the subject string to match exactly |
AADSTS700213 No matching federated identity record for issuer |
Token exchange | Issuer URL wrong/trailing-slash mismatch | az ad app federated-credential list vs token iss |
Correct the FIC issuer URL |
AADSTS700211 No configured federation in tenant |
Token exchange | Issuer not configured at all | List FICs on the app/UAMI | Add the FIC for that issuer/subject |
AADSTS50034 User/app not found in directory |
azure/login |
Wrong client-id / SP not created |
az ad sp show --id <appId> |
az ad sp create --id; fix client-id |
AADSTS7000215 Invalid client secret provided |
azure/login |
A secret is still being sent (not OIDC) | Workflow uses creds: JSON, not OIDC |
Remove AZURE_CREDENTIALS; use id-token: write |
Forbidden (data plane) |
az keyvault secret show |
Identity lacks Key Vault Secrets User |
az role assignment list --assignee <pid> |
Grant Secrets User at vault scope |
403 (control plane) |
az keyvault update |
Identity lacks Key Vault Contributor |
Role list on the vault scope | Grant control role to the operator |
ForbiddenByFirewall |
Any data-plane call | Vault firewall blocks the caller | Vault → Networking shows “selected networks” | Add IP / private endpoint / trusted services |
| KV reference empty / app crash | App boot | Identity not enabled or lacks role; bad URI | Environment variables blade red error | Enable identity; grant role; fix SecretUri |
SecretNotFound (404) |
Resolve | Secret deleted/disabled, or wrong vault name | az keyvault secret show 404 |
Restore/enable secret; correct vault |
maximum allowed value of 20 |
federated-credential create |
20-FIC ceiling on the app/UAMI | az ad app federated-credential list | length |
Consolidate via env scoping / flexible FIC |
Conflict on purge |
az keyvault purge |
Purge protection blocks hard-delete | Vault shows enablePurgeProtection: true |
Wait out retention; this is by design |
Three reading notes that save the most time:
| Distinction | The trap | How to tell them apart |
|---|---|---|
AADSTS70021 (subject) vs 700213 (issuer) |
Both say “no matching federated identity” | 70021 = subject mismatch; 700213 = issuer mismatch — check which field differs |
Control-plane 403 vs data-plane Forbidden |
Both look like “permission denied” | 403 on vaults/write-type ops = RBAC; Forbidden on secrets/getValue = data role/policy |
| “No matching FIC” vs “access denied” | You add a role when the subject is wrong | If the token never exchanged, it is a FIC/subject problem, not RBAC — no token reached the data plane |
Step 1 — Key Vault foundations
Before federating anything, get the vault right. Two decisions dominate: the authorization model and data protection. Both are one-way doors in practice.
RBAC over access policies. Legacy access policies are a flat list on the vault; anyone with Microsoft.KeyVault/vaults/write (Contributor, Key Vault Contributor) can grant themselves data access — a privilege-escalation path. Azure RBAC uses the standard role-assignment plane, supports scoping down to an individual secret, and is the recommended model. As of recent Key Vault API versions, RBAC is the default for newly created vaults.
az keyvault create \
--name kv-plat-prod-001 \
--resource-group rg-platform-prod \
--location australiaeast \
--enable-rbac-authorization true \
--enable-purge-protection true \
--retention-days 90 \
--public-network-access Disabled \
--sku standard
resource kv 'Microsoft.KeyVault/vaults@2023-07-01' = {
name: 'kv-plat-prod-001'
location: location
properties: {
tenantId: subscription().tenantId
sku: { family: 'A', name: 'standard' }
enableRbacAuthorization: true // RBAC data plane, not access policies
enableSoftDelete: true // always on; explicit for clarity
softDeleteRetentionInDays: 90
enablePurgeProtection: true // irreversible — production default
publicNetworkAccess: 'Disabled'
networkAcls: { defaultAction: 'Deny', bypass: 'AzureServices' }
}
}
The two authorization models, side by side — pick RBAC unless you have a specific legacy reason:
| Dimension | Access policies (legacy) | Azure RBAC (recommended) |
|---|---|---|
| Granularity | Per-vault only (all secrets) | Per-vault, per-object (down to one secret) |
| Escalation risk | High — vaults/write can self-grant data |
Low — data roles are separate from control |
| Where it lives | A list on the vault resource | Standard role assignments (auditable centrally) |
| Max entries | ~1024 policies per vault | RBAC role-assignment limits per scope |
| PIM / just-in-time | Not supported | Supported (eligible roles, activation) |
| Default for new vaults | Off | On (recent API versions) |
| Use it when | A legacy tool hard-codes policy APIs | Everything else |
The data-plane RBAC roles you will actually use, and who gets each:
| Role | Grants | Assign to | Never assign to |
|---|---|---|---|
| Key Vault Secrets User | Read secret values | Runtime workloads (MI, federated apps) | Humans by default |
| Key Vault Secrets Officer | Create/update/delete secrets | CI/CD that seeds secrets; secret-ops | Runtime app identities |
| Key Vault Certificates Officer | Manage certificates | PKI automation, cert-ops | Runtime app identities |
| Key Vault Crypto User | Use keys (wrap/unwrap/sign) | Apps doing envelope encryption | Anyone needing only secrets |
| Key Vault Crypto Officer | Manage keys (create/rotate/delete) | Key-ops, HSM admins | Runtime app identities |
| Key Vault Administrator | All data-plane ops | Break-glass, platform admins only | Pipelines, runtime workloads |
| Key Vault Reader | Read vault metadata (not values) | Auditors, inventory tooling | — |
Assign least privilege at the secret scope where you can, and never hand a runtime workload more than Secrets User:
az role assignment create \
--role "Key Vault Secrets User" \
--assignee-object-id "$APP_PRINCIPAL_ID" \
--assignee-principal-type ServicePrincipal \
--scope "/subscriptions/$SUB/resourceGroups/rg-platform-prod/providers/Microsoft.KeyVault/vaults/kv-plat-prod-001/secrets/orders-db-conn"
Soft-delete and purge protection. Soft-delete (always on) recovers a deleted vault or secret within the retention window. Purge protection blocks even a privileged actor from hard-deleting before that window elapses, defeating a ransomware-style destroy. It is irreversible once enabled — that is the point. The data-protection knobs and their trade-offs:
| Setting | Values | Default | When to change | Trade-off / gotcha |
|---|---|---|---|---|
enableSoftDelete |
true (forced) | true | Cannot disable | Deleted objects occupy the namespace until purged |
softDeleteRetentionInDays |
7–90 | 90 | Lower only for cost/test | Can’t reuse a soft-deleted name until purge/retention |
enablePurgeProtection |
true / (unset) | unset | Always on in prod | Irreversible; blocks redeploy that recreates the same vault name |
enableRbacAuthorization |
true / false | true (new) | Keep true | Switching mid-life requires re-granting data access |
publicNetworkAccess |
Enabled / Disabled | Enabled | Disabled in prod | Disabling without a private path locks out your own pipelines |
networkAcls.defaultAction |
Allow / Deny | Allow | Deny in prod | Deny without bypass: AzureServices breaks some integrations |
sku.name |
standard / premium | standard | premium for HSM-backed keys | Premium costs more; only needed for FIPS 140-2 L2 keys |
Network isolation. --public-network-access Disabled plus a private endpoint keeps the data plane off the internet. Pair it with a Key Vault firewall that allows trusted Azure services so platform integrations still resolve. The network options, ordered by how locked-down they are:
| Posture | What it does | Effort | Use it for | Watch-out |
|---|---|---|---|---|
| Public, no firewall | Reachable from anywhere with RBAC | None | Dev/throwaway only | Data plane on the internet |
| Public + IP firewall | Allow-listed source IPs only | Low | Small fixed egress sets | Cloud Shell / runner IPs drift |
| Trusted services bypass | Allow Azure platform integrations | Low | App Service KV references, etc. | Broad “Azure services,” not your tenant only |
| Private endpoint | Vault gets a private IP in your VNet | Medium | Production default | Needs privatelink.vaultcore.azure.net DNS |
| Private + public disabled | Only the VNet path resolves | Medium | Strict isolation/compliance | Pipelines need a private path or self-hosted runner |
Step 2 — Managed identities, decoded
Inside Azure, you almost never need federation — you need a managed identity. There are two flavours, and choosing wrong creates real operational pain.
- System-assigned: lifecycle tied 1:1 to a single resource — created and deleted with it. Good for a standalone service where the identity should never outlive the workload.
- User-assigned (UAMI): a standalone resource you create once and attach to many workloads. This is what you want at platform scale: assign Key Vault RBAC to the UAMI once, and every App Service, VM, or AKS pod that carries it inherits access. It also survives blue/green resource replacement.
# A UAMI shared across a workload family
az identity create \
--name id-orders-api \
--resource-group rg-platform-prod \
--location australiaeast
APP_PRINCIPAL_ID=$(az identity show -n id-orders-api -g rg-platform-prod --query principalId -o tsv)
APP_CLIENT_ID=$(az identity show -n id-orders-api -g rg-platform-prod --query clientId -o tsv)
The two flavours, decided as a table — most platforms standardise on UAMI:
| Dimension | System-assigned | User-assigned (UAMI) |
|---|---|---|
| Lifecycle | Born/dies with the resource | Independent resource |
| Reuse across workloads | No (1:1) | Yes (1:many) |
| Survives blue/green replace | No (new identity each time) | Yes (re-attach the same UAMI) |
| Role-assignment churn | Re-grant on every recreate | Grant once, inherit everywhere |
| Best for | A single standalone service | A workload family / platform scale |
| Federation target | Cannot hold a FIC | Can hold FICs (AKS, external) |
| Cleanup risk | Auto-cleaned with resource | Orphaned role assignments if forgotten |
| Cost | Free | Free |
Where you can attach a managed identity, and how the token is delivered — this determines whether you even can use one:
| Host | MI support | Token delivery | Notes / limit |
|---|---|---|---|
| App Service / Functions | System + user | IMDS-like endpoint (env-injected) | Multiple UAMIs allowed; pick one for KV references |
| Virtual Machine / VMSS | System + user | IMDS 169.254.169.254 |
UAMI must be assigned to the VM |
| AKS (workload identity) | UAMI via FIC | Projected SA token → exchange | Pod-managed identity is deprecated |
| Container Apps | System + user | Managed endpoint | Similar to App Service |
| Logic Apps (Standard) | System + user | Managed endpoint | Use for connectors needing KV |
| Azure DevOps / GitHub | App + FIC (federation) | External OIDC, not IMDS | No IMDS off-Azure → federation, not MI |
For an App Service, attach the UAMI and point app settings at the vault using Key Vault references — the platform resolves them at startup using the identity, so your code never sees a secret string:
az webapp identity assign \
--name app-orders-prod --resource-group rg-platform-prod \
--identities "/subscriptions/$SUB/resourceGroups/rg-platform-prod/providers/Microsoft.ManagedIdentity/userAssignedIdentities/id-orders-api"
az webapp config appsettings set \
--name app-orders-prod --resource-group rg-platform-prod \
--settings "Db__ConnString=@Microsoft.KeyVault(SecretUri=https://kv-plat-prod-001.vault.azure.net/secrets/orders-db-conn/)"
resource site 'Microsoft.Web/sites@2023-12-01' = {
name: 'app-orders-prod'
location: location
identity: {
type: 'UserAssigned'
userAssignedIdentities: { '${uami.id}': {} }
}
properties: {
serverFarmId: plan.id
keyVaultReferenceIdentity: uami.id // which identity resolves KV references
siteConfig: {
appSettings: [
{
name: 'Db__ConnString'
value: '@Microsoft.KeyVault(SecretUri=https://kv-plat-prod-001.vault.azure.net/secrets/orders-db-conn/)'
}
]
}
}
}
The
SecretUriwithout a version (trailing/) resolves the current version. That single decision is the foundation of zero-downtime rotation in Step 6. With multiple UAMIs attached, you must setkeyVaultReferenceIdentityor the platform does not know which identity to use and the reference fails.
The Key Vault reference syntax has exactly two forms — know both and their reload behaviour:
| Reference form | Resolves | Reloads on rotation? | Use it when |
|---|---|---|---|
SecretUri=…/secrets/<name>/ (no version) |
Current version | Yes (on restart + periodic refresh) | Default — enables rotation |
SecretUri=…/secrets/<name>/<version> |
That pinned version | No — frozen | Almost never; reintroduces rotation outages |
VaultName=…;SecretName=… (alt syntax) |
Current version | Yes | Older syntax; prefer SecretUri |
Step 3 — Workload identity federation: how the trust works
Federation lets Entra ID accept an OIDC token from an external issuer in exchange for an Entra access token — no client secret involved. You configure a federated identity credential (FIC) on either an app registration or a user-assigned managed identity. A FIC is a trust assertion with three fields that must all match the incoming token:
- issuer — the OIDC issuer URL (e.g.
https://token.actions.githubusercontent.com, or your AKS cluster’s OIDC issuer URL) - subject — the exact
subclaim identifying the workload (a repo+branch, a repo+environment, or a Kubernetes service account) - audience — for Entra this is
api://AzureADTokenExchange
At runtime the external platform issues a short-lived OIDC token, the workload presents it to Entra ID’s token endpoint, Entra validates issuer/subject/audience against a configured FIC, and returns a normal access token. The OIDC token lives minutes; nothing durable is stored.
The three fields, what each does, and the exact failure when it is wrong:
| FIC field | What it is | Example | Failure if wrong |
|---|---|---|---|
issuer |
OIDC issuer URL (must match token iss) |
https://token.actions.githubusercontent.com |
AADSTS700213 issuer mismatch |
subject |
Exact sub claim of the workload |
repo:contoso/orders-api:environment:prod |
AADSTS70021 no matching subject |
audiences |
Who the token is for (Entra fixed value) | api://AzureADTokenExchange |
Token rejected / audience mismatch |
name |
A label for the FIC (your choice) | gh-orders-prod-env |
Cosmetic; must be unique on the object |
Where you can host a FIC, and the trade-off:
| FIC host | Holds FICs? | Pros | Cons |
|---|---|---|---|
| App registration | Yes | Supports flexible FICs (claims matching, wildcards) | Two objects (app + SP) to manage |
| User-assigned MI | Yes | Single object; natural for AKS SAs | Exact-match subjects only (no wildcards yet) |
| System-assigned MI | No | — | Cannot federate; use a UAMI instead |
Limit: a single managed identity (or app) supports a maximum of 20 federated identity credentials. Plan subjects accordingly — one FIC per branch and per environment adds up fast. Flexible federated credentials (claims matching with wildcards) exist for GitHub/GitLab/Terraform Cloud on app objects if you outgrow exact-match.
The federation limits you will actually hit:
| Limit | Value | Consequence | Mitigation |
|---|---|---|---|
| FICs per app / UAMI | 20 | 21st create fails | Env-scope subjects; flexible FIC; one identity per trust boundary |
| OIDC token lifetime (GitHub) | ~minutes | Long jobs may need re-issue | SDK re-requests automatically |
| Entra access-token lifetime | ~60–90 min | Token expires mid-job | SDK refreshes via the FIC |
| Subject string length / format | Issuer-defined | Mismatch → 70021 | Copy the exact sub from a token dump |
| Flexible FIC issuers | GitHub/GitLab/TF Cloud (app only) | Not on UAMI | Use an app registration for wildcards |
Step 4 — Federating GitHub Actions to Azure
This kills the AZURE_CREDENTIALS JSON secret that haunts so many pipelines. Create (or reuse) an app registration, then add a FIC whose subject pins the exact repo and ref.
APP_ID=$(az ad app create --display-name "gh-orders-deploy" --query appId -o tsv)
az ad sp create --id "$APP_ID"
The subject claim is where least privilege lives. Pin to a branch or a GitHub Environment — environment scoping is stronger because it lets you gate on approvals and environment protection rules:
# Environment-scoped: only the 'prod' environment of this repo can assume the identity
az ad app federated-credential create \
--id "$APP_ID" \
--parameters '{
"name": "gh-orders-prod-env",
"issuer": "https://token.actions.githubusercontent.com",
"subject": "repo:contoso/orders-api:environment:prod",
"audiences": ["api://AzureADTokenExchange"]
}'
Common subject formats — copy the one that matches how the workflow is triggered:
| Scenario | Subject | Strength |
|---|---|---|
| Branch push | repo:ORG/REPO:ref:refs/heads/main |
Medium (no approvals) |
| Tag | repo:ORG/REPO:ref:refs/tags/v1.2.3 |
Medium |
| Pull request | repo:ORG/REPO:pull_request |
Low (any PR) |
| Environment (preferred) | repo:ORG/REPO:environment:prod |
High (approvals + protection rules) |
| Reusable workflow | repo:ORG/REPO:job_workflow_ref:ORG/REPO/.github/workflows/x.yml@ref |
High (pins the workflow) |
| Org-wide (flexible FIC) | claims match repository_owner == 'ORG' |
Scales to many repos |
Grant the app’s service principal only the roles that deployment needs — scoped to the target resource group, never the subscription. Then the workflow needs the id-token: write permission and the azure/login action with no secret:
name: deploy-orders
on:
push:
branches: [main]
permissions:
id-token: write # required to request the GitHub OIDC token
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
environment: prod # must match the FIC subject 'environment:prod'
steps:
- uses: actions/checkout@v4
- uses: azure/login@v2
with:
client-id: ${{ vars.AZURE_CLIENT_ID }}
tenant-id: ${{ vars.AZURE_TENANT_ID }}
subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
- run: az webapp deploy --name app-orders-prod --resource-group rg-platform-prod --src-path ./app.zip --type zip
Note AZURE_CLIENT_ID and friends are repository variables, not secrets — they are identifiers, not credentials, and leaking them grants nothing without the matching OIDC trust. The two workflow permissions that gate this, and what breaks without them:
| Workflow element | Purpose | If missing | Symptom |
|---|---|---|---|
permissions: id-token: write |
Lets the job request the GitHub OIDC token | No token minted | azure/login cannot get an assertion |
permissions: contents: read |
Checkout access | Checkout fails | Job fails before login |
environment: prod |
Adds environment:prod to the sub |
Subject mismatch | AADSTS70021 if FIC is env-scoped |
client-id (variable, not secret) |
Identifies the app to Entra | Wrong/empty | AADSTS50034 app not found |
azure/login@v2 |
Performs the token exchange | (older v1 lacks OIDC) | Falls back to secret-based login |
The GitHub-vs-secret comparison that justifies the migration:
| Aspect | AZURE_CREDENTIALS secret (old) |
OIDC federation (new) |
|---|---|---|
| Stored credential | Long-lived JSON in every repo | None |
| Rotation | Manual, coordinated across repos | Nothing to rotate |
| Blast radius if leaked | Full SP access until revoked | Identifiers only; useless without trust |
| Scoping | One SP, broad | Per repo/branch/environment subject |
| Audit attribution | Shared SP for all repos | Per-FIC, per-environment sign-in |
| Approvals gate | No | Yes (environment protection rules) |
Step 5 — AKS workload identity
Inside the cluster, pod-managed identity is deprecated; Microsoft Entra Workload ID is the model. The cluster runs an OIDC issuer, and a mutating webhook injects a projected service-account token plus the environment variables the Azure SDKs expect. Enable both:
az aks update \
--name aks-plat-prod --resource-group rg-platform-prod \
--enable-oidc-issuer \
--enable-workload-identity
OIDC_ISSUER=$(az aks show -n aks-plat-prod -g rg-platform-prod \
--query "oidcIssuerProfile.issuerUrl" -o tsv)
Federate a UAMI to a specific Kubernetes service account. The subject is system:serviceaccount:<namespace>:<name> and the issuer is the cluster’s OIDC URL:
az identity federated-credential create \
--name fic-orders-sa \
--identity-name id-orders-api \
--resource-group rg-platform-prod \
--issuer "$OIDC_ISSUER" \
--subject "system:serviceaccount:orders:sa-orders" \
--audiences "api://AzureADTokenExchange"
Annotate the service account with the UAMI client ID, and label pods to opt in. The annotation tells the webhook which identity to broker; the pod label flips the workload into the webhook’s injection path.
apiVersion: v1
kind: ServiceAccount
metadata:
name: sa-orders
namespace: orders
annotations:
azure.workload.identity/client-id: "<APP_CLIENT_ID of id-orders-api>"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: orders-api
namespace: orders
spec:
template:
metadata:
labels:
azure.workload.identity/use: "true" # opt this pod into the webhook
spec:
serviceAccountName: sa-orders
containers:
- name: orders-api
image: acrplatprod.azurecr.io/orders-api:1.4.0
The four moving parts of AKS workload identity, and the failure when each is missing — this is the table to keep open when a pod can’t get a token:
| Part | What it does | If missing | How to confirm |
|---|---|---|---|
--enable-oidc-issuer |
Cluster issues OIDC tokens | No issuer URL to federate | az aks show --query oidcIssuerProfile.issuerUrl empty |
--enable-workload-identity |
Installs the mutating webhook | No env vars / token injected | Webhook pods absent in kube-system |
FIC subject = system:serviceaccount:ns:name |
Entra trusts that SA | AADSTS70021 |
Compare FIC subject to the pod’s SA |
SA annotation client-id |
Tells webhook which identity | Webhook can’t broker | kubectl get sa -o yaml shows no annotation |
Pod label azure.workload.identity/use: "true" |
Opts the pod in | No env vars injected | kubectl exec … env | grep AZURE_ empty |
The environment variables the webhook injects (your SDK reads these automatically):
| Variable | Value | Used by |
|---|---|---|
AZURE_CLIENT_ID |
The UAMI client ID | SDK to identify the identity |
AZURE_TENANT_ID |
Your tenant | SDK token request |
AZURE_FEDERATED_TOKEN_FILE |
Path to the projected SA token | SDK reads the assertion |
AZURE_AUTHORITY_HOST |
Entra login host | SDK token endpoint |
With DefaultAzureCredential, the SDK inside the pod now authenticates with zero config. If you prefer secrets mounted as files, layer the Azure Key Vault provider for Secrets Store CSI Driver, which also works in workload-identity mode:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: spc-orders-kv
namespace: orders
spec:
provider: azure
parameters:
usePodIdentity: "false"
clientID: "<APP_CLIENT_ID of id-orders-api>" # workload identity mode
keyvaultName: "kv-plat-prod-001"
tenantId: "<TENANT_ID>"
objects: |
array:
- |
objectName: orders-db-conn
objectType: secret
Enable the add-on with rotation when you create or update the cluster:
az aks enable-addons \
--addons azure-keyvault-secrets-provider \
--name aks-plat-prod --resource-group rg-platform-prod \
--enable-secret-rotation \
--rotation-poll-interval 2m
The two ways an AKS pod consumes a vault secret, side by side — DefaultAzureCredential vs CSI mount:
| Aspect | SDK + DefaultAzureCredential |
CSI Secrets Store mount |
|---|---|---|
| How the app gets the value | Calls Key Vault at runtime | Reads a mounted file |
| Code change | Minimal (SDK call) | None (read a file path) |
| Rotation pickup | Per call / your cache | Polled at rotation-poll-interval |
| Network path | Pod → Key Vault (needs egress/PE) | Same, via the CSI driver pod |
K8s Secret sync |
No | Optional (secretObjects) |
| Best for | Apps already using the SDK | Legacy apps that expect files |
| Failure mode | Token/role errors surface in app | Mount fails → pod stuck ContainerCreating |
Step 6 — Rotation without downtime
Rotation breaks applications when code pins a version. The discipline is to reference secrets without a version and let the resolver follow the current one.
- App Service / Key Vault references: a versionless
SecretUri(Step 2) re-resolves on app restart and on a periodic refresh, so rotating the secret in the vault propagates without a redeploy. - CSI driver: with
--enable-secret-rotation, the provider polls the vault everyrotation-poll-interval(default 2 minutes) and updates both the mounted files and any synced KubernetesSecret. Mounted file content updates in place; apps that read the file per request pick it up automatically. Apps that read once at startup still need a signal — watch the file or subscribe to the rotation. - Event-driven: Key Vault emits
Microsoft.KeyVault.SecretNewVersionCreatedto Event Grid. Wire that to a Function or webhook to trigger graceful cache invalidation or a rolling restart the moment a new version lands, rather than waiting on a poll interval.
How each consumer picks up a rotated secret, and the latency you should expect:
| Consumer | Pickup mechanism | Typical latency | App restart needed? | Gotcha |
|---|---|---|---|---|
| App Service KV reference | Restart + periodic refresh | Up to several hours (refresh) | No (but restart is instant) | Pinned version never refreshes |
| CSI mount (per-request read) | Poll interval | rotation-poll-interval (2m default) |
No | App must re-read the file |
| CSI mount (read at startup) | Poll updates file only | n/a until restart | Yes, or watch the file | Stale in-memory value |
| SDK + cached secret | Your cache TTL | Your design | No | Cache too long → stale; too short → throttle |
| Event Grid → Function | Event push | Seconds | Optional (you control) | Must build the handler |
| Hardcoded value anywhere | None | Never | Yes (redeploy) | This is the anti-pattern |
The golden rule: store the secret in exactly one place (the vault), reference it versionlessly everywhere, and treat rotation as a vault-side operation that consumers observe — never a coordinated multi-system deploy.
The Event Grid event types Key Vault emits, and what to wire each to:
| Event type | Fires when | Wire it to |
|---|---|---|
SecretNewVersionCreated |
A new secret version is created | Cache invalidation / rolling restart |
SecretNearExpiry |
Secret nears its expiry date | Rotation automation / alert |
SecretExpired |
Secret has expired | Page on-call; block deploys |
CertificateNewVersionCreated |
Cert renewed | Reload TLS listeners |
CertificateNearExpiry / Expired |
Cert lifecycle | PKI automation / alert |
KeyNewVersionCreated |
Key rotated | Re-wrap data-encryption keys |
Step 7 — Auditing and detecting orphaned secrets
You cannot claim “secret-free” without proving it. Two fronts: find the secrets you missed, and watch the vault you kept.
Find orphaned secrets. Sweep app settings and pipeline definitions for plaintext that should be a Key Vault reference or a federated identity:
# App settings that look like inline secrets rather than KV references
az webapp config appsettings list -n app-orders-prod -g rg-platform-prod \
--query "[?!contains(value, '@Microsoft.KeyVault')].name" -o tsv
Hunt the classic offenders across the estate with Resource Graph — for example, web apps inventory, then app registrations that still carry password credentials (a federation candidate):
az graph query -q "
resources
| where type == 'microsoft.web/sites'
| extend kind = tostring(kind)
| project name, resourceGroup, kind"
The estate-wide checks worth scripting into a weekly job:
| What to hunt | Where | Why it matters | Action |
|---|---|---|---|
App settings without @Microsoft.KeyVault |
App Service config | Inline secret instead of a reference | Convert to a KV reference |
App registrations with passwordCredentials |
Entra (Graph) | A federation candidate / leakable secret | Add a FIC, revoke the secret |
| SP secrets nearing expiry | Entra | Imminent outage when they lapse | Federate or rotate |
Pinned-version SecretUri |
App config | Breaks rotation silently | Drop the version segment |
| Vaults with access policies (not RBAC) | Key Vault | Escalation-prone authorization | Migrate to RBAC data plane |
| Vaults with public network + no firewall | Key Vault | Data plane on the internet | Add private endpoint / firewall |
Key Vault Administrator on a runtime identity |
RBAC | Massive over-grant | Downgrade to Secrets User |
Diagnostic logs. Route Key Vault AuditEvent logs to Log Analytics so every data-plane access is queryable and retained:
az monitor diagnostic-settings create \
--name kv-audit \
--resource "/subscriptions/$SUB/resourceGroups/rg-platform-prod/providers/Microsoft.KeyVault/vaults/kv-plat-prod-001" \
--logs '[{"category":"AuditEvent","enabled":true}]' \
--workspace "/subscriptions/$SUB/resourceGroups/rg-obs/providers/Microsoft.OperationalInsights/workspaces/law-platform"
The Key Vault log categories and what each is the source of truth for:
| Category | Captures | Use it for |
|---|---|---|
AuditEvent |
Every data-plane op (get/set/delete) + caller identity | Who read which secret, and result |
AzurePolicyEvaluationDetails |
Policy evaluation on the vault | Compliance/governance audits |
AllMetrics |
Latency, availability, saturation | Health dashboards, capacity |
Alert on anomalies. A KQL alert for access from an unexpected identity or a spike in SecretGet denials catches both misconfiguration and intrusion:
AzureDiagnostics
| where ResourceType == "VAULTS" and OperationName == "SecretGet"
| where ResultType != "Success"
| summarize denials = count() by identity_claim_appid_g, bin(TimeGenerated, 15m)
| where denials > 10
The KQL you will reach for most — one query per question you ask during an incident or audit:
| Question | Operation filter | Key column | One-liner |
|---|---|---|---|
| Who is being denied secrets? | SecretGet, ResultType != Success |
identity_claim_appid_g |
summarize count() by appid |
| Who read this specific secret? | SecretGet, success |
id_s (secret URI) |
where id_s contains "orders-db-conn" |
| Sudden spike in reads (exfil)? | SecretGet |
bin(TimeGenerated, 5m) |
summarize count() by bin(…) |
| New/unexpected caller identity? | any | identity_claim_appid_g |
distinct appid vs an allow-list |
| Secret deletions (destructive)? | SecretDelete |
CallerIPAddress |
where OperationName == "SecretDelete" |
| Access from outside expected IPs? | any | CallerIPAddress |
where CallerIPAddress !in (…) |
Architecture at a glance
The diagram traces the credential path exactly as a workload travels it, left to right, with each numbered badge marking the precise hop where a passwordless flow fails. Read it as the secret-zero journey: a workload (a GitHub runner, an AKS pod, or an App Service) starts with no stored secret. Off-Azure, the external OIDC issuer mints a short-lived token whose sub claim names the workload; on-Azure, IMDS plays the same role. That token is presented to Entra ID, where a federated identity credential (or the managed identity itself) is matched on issuer/subject/audience and exchanged for a normal Entra access token. Only then does the workload reach the Key Vault data plane, where an RBAC data role (Key Vault Secrets User) gates whether it can read the secret value — which finally resolves the versionless reference the app consumes. The private endpoint on the right keeps that last hop off the internet.
Notice the badges cluster where trust is actually established and where it most often breaks: badge 1 on the issuer/subject (the AADSTS70021 subject-drift trap), badge 2 on the Entra FIC (issuer mismatch and the 20-FIC ceiling), badge 3 on the data-plane role grant (the Forbidden that means “no Secrets User,” not “wrong subject”), badge 4 on the versionless reference (a pinned version that silently never rotates), and badge 5 on the network path (a private endpoint whose DNS resolves to a public IP). The legend narrates each as symptom · confirm · fix — that is the whole diagnostic method: localise the failure to one hop, read the confirm command, apply the fix.
Real-world scenario
Meridian Retail runs a forty-service microservice platform on Azure: AKS for the runtime, App Service for a handful of legacy APIs, and GitHub Actions for every deployment. The platform team is six engineers; the estate spans three subscriptions in australiaeast. Their mandate from a post-incident review was blunt: after a contractor’s leaked personal access token was found to still hold deploy rights two months after offboarding, no long-lived deploy secret may exist anywhere in the estate within one quarter.
They started where the risk was highest — CI/CD. Every one of the forty repos carried the same AZURE_CREDENTIALS JSON secret for a single shared service principal. They federated each repo’s prod environment to one shared gh-deploy app registration, one FIC per repo. Within two weeks they hit the wall: the 21st az ad app federated-credential create failed with The number of federated identity credentials on the application has reached the maximum allowed value of 20. The instinct was to mint more app registrations — but that scatters role assignments and audit identity across dozens of principals, exactly the sprawl they were trying to kill.
The fix was to stop modelling identity per repo. They created one user-assigned managed identity per deployment tier (id-deploy-prod, id-deploy-nonprod) and adopted GitHub’s repository_owner claim instead of pinning each repo. Crucially, a plain sub match cannot express “any repo in this org,” so they switched to a flexible federated credential on an app registration, using claimsMatchingExpression against assertion.repository_owner gated on the prod environment:
az ad app federated-credential create \
--id "$APP_ID" \
--parameters '{
"name": "gh-org-prod",
"issuer": "https://token.actions.githubusercontent.com",
"audiences": ["api://AzureADTokenExchange"],
"claimsMatchingExpression": {
"value": "claims['"'"'repository_owner'"'"'] eq '"'"'meridian'"'"' and claims['"'"'environment'"'"'] eq '"'"'prod'"'"'",
"languageVersion": 1
}
}'
One credential now covered every repo the org owned, gated on prod so approvals still applied. Forty FICs collapsed to two, role assignments lived on two identities, and sign-in logs attributed every deploy to one auditable principal.
The AKS side had its own trap. Three teams had copied a working SecretProviderClass but their pods kept failing with secrets that mounted empty. The platform on-call traced it to two distinct causes via the failure table: two teams had omitted the azure.workload.identity/use: "true" pod label (so the webhook never injected the token — kubectl exec … env | grep AZURE_ came back empty), and one team’s FIC subject read system:serviceaccount:orders:orders-sa while the deployment used serviceAccountName: sa-orders — a one-token mismatch that produced AADSTS70021, not a Key Vault error, which is why they had spent a day staring at vault RBAC.
By quarter end: every pipeline federated, the AZURE_CREDENTIALS secret deleted from all forty repos, AKS on workload identity with CSI rotation polling every two minutes, App Service on versionless references, and a Resource Graph job that fails the nightly build if any app registration still carries a password credential. The contractor-token class of incident became impossible — there was no longer a stored credential to leak. The lesson on the wall: “Federation subjects map to a trust boundary, not to a repository. Model the boundary first and the credential count takes care of itself.”
The migration as a timeline, because the order of moves is the lesson:
| Week | State | Action taken | Effect | What it should have been |
|---|---|---|---|---|
| 1 | 40 repos, shared AZURE_CREDENTIALS |
Federate each repo’s prod env to one app | First repos go secretless | Sound start |
| 2 | 20 FICs created | 21st federated-credential create fails |
Hit the 20-FIC ceiling | Anticipate the ceiling up front |
| 2 | Ceiling hit | Plan to mint more app registrations | Identity/audit sprawl looming | Don’t — model the boundary |
| 3 | Re-modelled | UAMI per tier + flexible FIC on repository_owner |
40 FICs → 2; one principal per tier | The correct design |
| 4 | AKS migration | Copy SecretProviderClass, pods mount empty |
Two causes: missing label + subject typo | Use the failure table first |
| 4 | Diagnosed | Add pod label; fix FIC subject to match SA | Pods get tokens; secrets mount | — |
| 13 | Secret-free | Delete AZURE_CREDENTIALS; nightly Graph gate |
Leak-class incident impossible | The destination |
Advantages and disadvantages
The passwordless model removes the credential you most fear leaking, but it relocates the complexity into trust configuration — which has its own sharp edges. Weigh it honestly:
| Advantages (why this model wins) | Disadvantages (why it bites) |
|---|---|
| No stored credential to leak — the highest-risk secret simply does not exist | Trust config (issuer/subject/audience) is exact-match and unforgiving — one typo → cryptic AADSTS70021 |
| Nothing to rotate — rotation becomes a vault-side event, not a coordinated deploy | The 20-FIC ceiling forces you to model trust boundaries, not just wire up repos |
| Per-workload, per-environment attribution in sign-in logs | Failures are vague by design — “no matching FIC” vs “access denied” confuses teams for hours |
| RBAC data plane gives least privilege down to a single secret | Two authorization planes (control vs data) — easy to grant the wrong one |
Managed identity needs zero app config inside Azure (DefaultAzureCredential) |
Off-Azure (CI, on-prem) you must federate — IMDS isn’t there to lean on |
| Private endpoint + RBAC keeps the data plane off the internet | Disabling public access without a private path locks out your own pipelines |
| Purge protection defeats a ransomware-style destroy | Purge protection is irreversible and blocks redeploys that recreate a vault name |
The model is right for any estate that runs workloads needing credentials — which is all of them — and especially for CI/CD and AKS where the long-lived deploy secret is the crown-jewel risk. It bites hardest on teams that model identity per repository (the FIC ceiling), teams new to the control/data-plane split (wrong-role grants), and anyone who flips network isolation before landing a private path. Every disadvantage is manageable — but only if you know it exists, which is the point of this article.
Hands-on lab
Stand up a vault with RBAC, attach a user-assigned identity, store a secret, grant least privilege, and read it back as that identity — all free-tier-friendly. Then reproduce the classic Forbidden failure and fix it. Run in Cloud Shell (Bash).
Step 1 — Variables and resource group.
RG=rg-kv-lab
LOC=australiaeast
KV=kv-lab-$RANDOM # globally-unique vault name
UAMI=id-kv-lab
az group create -n $RG -l $LOC -o table
SUB=$(az account show --query id -o tsv)
Step 2 — Create a vault with RBAC authorization (no access policies).
az keyvault create -n $KV -g $RG -l $LOC \
--enable-rbac-authorization true \
--sku standard -o table
Expected: a vault row; properties.enableRbacAuthorization = true.
Step 3 — Create a user-assigned identity and capture its principal ID.
az identity create -n $UAMI -g $RG -l $LOC -o table
PID=$(az identity show -n $UAMI -g $RG --query principalId -o tsv)
CID=$(az identity show -n $UAMI -g $RG --query clientId -o tsv)
echo "principalId=$PID clientId=$CID"
Step 4 — Seed a secret (as yourself — you need Secrets Officer). First grant yourself, then write:
ME=$(az ad signed-in-user show --query id -o tsv)
az role assignment create --assignee-object-id $ME --assignee-principal-type User \
--role "Key Vault Secrets Officer" \
--scope "/subscriptions/$SUB/resourceGroups/$RG/providers/Microsoft.KeyVault/vaults/$KV"
# wait a few seconds for RBAC to propagate, then:
az keyvault secret set --vault-name $KV --name demo-conn --value "Server=db;Pwd=p@ss" -o table
Step 5 — Reproduce the Forbidden failure. The UAMI has no data role yet. Simulate its access check:
# This lists role assignments for the UAMI on the vault — expect EMPTY (the bug)
az role assignment list --assignee $PID \
--scope "/subscriptions/$SUB/resourceGroups/$RG/providers/Microsoft.KeyVault/vaults/$KV" -o table
Empty output is the root cause: a workload carrying this UAMI would get Forbidden on SecretGet, which surfaces as an empty Key Vault reference and a crash loop — not an obvious “denied” in the app.
Step 6 — Grant least privilege and confirm.
az role assignment create --assignee-object-id $PID --assignee-principal-type ServicePrincipal \
--role "Key Vault Secrets User" \
--scope "/subscriptions/$SUB/resourceGroups/$RG/providers/Microsoft.KeyVault/vaults/$KV"
az role assignment list --assignee $PID \
--scope "/subscriptions/$SUB/resourceGroups/$RG/providers/Microsoft.KeyVault/vaults/$KV" \
--query "[].roleDefinitionName" -o tsv
# Expected: Key Vault Secrets User
Any workload (App Service, AKS pod) carrying this UAMI can now read demo-conn via a versionless reference — with zero stored secret.
Validation checklist. You created an RBAC vault, attached a reusable identity, hit the exact Forbidden/empty-reference failure from a missing data role, and fixed it with least privilege. No secret was stored to authenticate. The steps mapped to what each proves:
| Step | What you did | What it proves | Real-world analogue |
|---|---|---|---|
| 2 | RBAC vault, no access policies | The escalation-safe model is one flag | Every new production vault |
| 4 | Grant yourself Secrets Officer |
Control and data planes are separate | Seeding secrets from CI |
| 5 | UAMI has no role → empty result | The “empty KV reference” crash has a cause | The 02:00 crash-loop |
| 6 | Grant Secrets User, confirm |
Least privilege is the fix, not Administrator |
Hardening every workload identity |
Cleanup (avoid lingering charges and a soft-deleted name).
az group delete -n $RG --yes --no-wait
# The vault soft-deletes; purge if you want the name back immediately (no purge protection here):
az keyvault purge --name $KV --no-wait 2>/dev/null || true
Cost note. A Standard vault has no hourly charge — you pay per 10,000 operations (fractions of a rupee for this lab). The UAMI is free. Deleting the resource group stops everything; the vault soft-deletes for 90 days unless purged.
Common mistakes & troubleshooting
This is the playbook — the part you bookmark. First as a scannable table you read mid-incident, then the entries that bite hardest in full.
| # | Symptom | Root cause | Confirm (exact cmd / portal path) | Fix |
|---|---|---|---|---|
| 1 | azure/login fails AADSTS70021 no matching FIC |
Workflow environment/ref ≠ FIC subject |
Compare workflow sub to az ad app federated-credential list --id <appId> |
Correct the subject string to match exactly |
| 2 | AADSTS700213 no matching issuer |
FIC issuer URL wrong / trailing slash |
az ad app federated-credential list vs token iss |
Fix the issuer URL (no trailing slash) |
| 3 | AADSTS7000215 invalid client secret |
Workflow still sends a secret, not OIDC | Workflow uses creds: JSON; no id-token: write |
Add permissions: id-token: write; remove AZURE_CREDENTIALS |
| 4 | 21st FIC create fails “maximum value of 20” | 20-FIC ceiling on the app/UAMI | az ad app federated-credential list --id <appId> | jq length |
Env-scope subjects; flexible FIC; one identity per tier |
| 5 | App boots, KV-backed setting is empty, crash loop | Identity lacks Key Vault Secrets User |
Env variables blade red error; az role assignment list --assignee <pid> empty |
Grant Secrets User at vault scope |
| 6 | Forbidden on az keyvault secret show |
No data role, or vault on access policies | az keyvault show --query properties.enableRbacAuthorization |
Grant Secrets User (RBAC) or add access policy |
| 7 | KV reference resolves to public IP / times out | Vault firewall blocks, or private DNS wrong | Vault → Networking “selected networks”; nslookup <vault>.vault.azure.net |
Add private endpoint + privatelink.vaultcore.azure.net zone; allow trusted services |
| 8 | AKS pod has no AZURE_* env vars |
Missing pod label azure.workload.identity/use |
kubectl exec … env | grep AZURE_ empty |
Add the label; restart the deployment |
| 9 | AKS pod gets token but AADSTS70021 |
FIC subject ≠ system:serviceaccount:ns:name |
Compare FIC subject to serviceAccountName |
Fix subject to match the SA exactly |
| 10 | CSI mount stuck ContainerCreating |
SecretProviderClass wrong vault/secret/clientID |
kubectl describe pod events; CSI driver logs |
Correct keyvaultName/objectName/clientID |
| 11 | Secret rotated but app still uses the old value | Pinned-version SecretUri, or read-once-at-startup |
Grep config for a version segment in the URI | Drop the version; restart or watch the file |
| 12 | Can’t recreate a vault — name “already exists” | Soft-deleted vault holds the name | az keyvault list-deleted --query "[].name" |
Recover it, or purge (if not purge-protected) |
| 13 | az keyvault purge fails Conflict |
Purge protection blocks hard-delete | az keyvault show --query properties.enablePurgeProtection |
Wait out retention — by design, not a bug |
| 14 | Deploy works on main but not on a tag |
FIC subject pins a branch, not the tag | Token sub is ref:refs/tags/… |
Add a tag-subject FIC or use a broader claim |
The expanded form, with the full reasoning for the entries that waste the most time:
1. azure/login fails with AADSTS70021 “No matching federated identity record found.”
Root cause: The OIDC token’s sub claim does not match any FIC subject. Most often the workflow lacks the environment: key (so the sub is ref:… not environment:prod), or a branch/tag/environment was renamed.
Confirm: Print the FIC subjects with az ad app federated-credential list --id "$APP_ID" --query "[].subject" and compare to how the workflow is actually triggered. Add a debug step to dump the token claims if unsure.
Fix: Make the subject string match the token exactly — including environment:prod when the job sets environment: prod. Subjects are case- and string-sensitive.
4. The 21st az ad app federated-credential create fails: “maximum allowed value of 20.”
Root cause: The 20-FIC ceiling per app/UAMI, reached because identity was modelled per repo/branch.
Confirm: az ad app federated-credential list --id "$APP_ID" | jq length returns 20.
Fix: Stop pinning each repo. Use environment-scoped subjects, or a flexible federated credential matching repository_owner on an app registration, or one identity per trust boundary (deployment tier) rather than per repo. Minting more app registrations scatters audit identity — avoid it.
5. App boots but a Key Vault-backed app setting is empty and the app crash-loops.
Root cause: The app’s identity has no Key Vault Secrets User role (or no identity is enabled, or the SecretUri is wrong), so the reference resolves to nothing. The app never sees “denied” — it sees an empty connection string.
Confirm: Portal → Environment variables shows the reference with a red error; az webapp config appsettings list --query "[?contains(value,'KeyVault')]"; check az webapp identity show and az role assignment list --assignee <principalId>.
Fix: Enable the identity; grant Key Vault Secrets User; set keyVaultReferenceIdentity if multiple UAMIs are attached; verify the secret exists/enabled and the URI (drop any pinned version).
7. The Key Vault reference resolves to a public IP or times out behind a private endpoint.
Root cause: The vault is private but DNS resolves the public name, or the vault firewall blocks the caller.
Confirm: nslookup kv-plat-prod-001.vault.azure.net returns a public IP instead of the private endpoint IP; the vault’s Networking blade shows “selected networks” without your path.
Fix: Link the privatelink.vaultcore.azure.net private DNS zone to the VNet (group id vault); allow trusted Azure services on the firewall for App Service KV references; ensure the app’s outbound routes through the VNet.
8 & 9. AKS pod can’t authenticate. Two distinct failures that look identical from the app:
8 — no AZURE_* env vars at all: the pod is missing azure.workload.identity/use: "true", so the webhook never injected the token. Confirm with kubectl exec … env | grep AZURE_ (empty). Fix: add the label, restart.
9 — env vars present but AADSTS70021: the FIC subject does not match the pod’s service account. Confirm by comparing the FIC subject to the deployment’s serviceAccountName. Fix: align the subject to system:serviceaccount:<ns>:<name> exactly.
11. A rotated secret is ignored; the app keeps using the old value.
Root cause: A pinned-version SecretUri (which never refreshes) or an app that reads the secret once at startup and caches it forever.
Confirm: Grep the config/Bicep for a version segment after /secrets/<name>/; check whether the app re-reads on each use.
Fix: Use a versionless SecretUri; for CSI mounts read the file per request; subscribe to SecretNewVersionCreated for an immediate signal, or restart on rotation.
12 & 13. Vault name conflicts and purge.
12 — “name already exists” on create: a soft-deleted vault still holds the name. az keyvault list-deleted to see it; recover with az keyvault recover, or az keyvault purge if it is not purge-protected.
13 — purge fails Conflict: purge protection is on and the retention window has not elapsed. This is by design — there is no override. Plan vault names so you do not need to recreate them.
Best practices
- RBAC for the data plane, always. Set
enableRbacAuthorization: trueand useKey Vault Secrets User/Officer/Administrator. Access policies let a control-plane role self-grant data access. - Least privilege, scoped to the secret where you can. A runtime workload gets
Secrets Userand nothing more — neverAdministratororContributor. Scope to the individual secret if the role plane allows it. - Federate, don’t store. Anything off-Azure (GitHub, GitLab, on-prem) gets a FIC, not a stored secret. Delete
AZURE_CREDENTIALSand SP passwords once federation is proven. - Model FIC subjects to a trust boundary, not a repository. One identity per deployment tier with environment-scoped or flexible subjects, so you never hit the 20-FIC ceiling or scatter audit identity.
- Prefer UAMI over system-assigned at scale. Grant Key Vault access to the UAMI once; every workload that carries it inherits access and survives blue/green.
- Reference secrets versionlessly. A versionless
SecretUriis the foundation of zero-downtime rotation; a pinned version anywhere reintroduces an outage. - Enable purge protection and soft-delete in production. They defeat accidental and malicious destruction; accept that purge protection is irreversible.
- Disable public network access and add a private endpoint — but land the private path (DNS zone, trusted services) before flipping the switch, or you lock out your own pipelines.
- Set
keyVaultReferenceIdentitywhen multiple UAMIs are attached. Otherwise the platform cannot decide which identity resolves references, and they fail. - Treat subjects as a contract. Renaming a GitHub Environment, branch, or Kubernetes service account silently breaks the FIC — change them deliberately and update the FIC in lockstep.
- Route
AuditEventto Log Analytics and alert on anomalies. Denial spikes and unexpected caller identities catch both misconfiguration and intrusion. - Gate the build on a secret-free invariant. A nightly Resource Graph sweep that fails if any app registration still carries a password credential keeps the estate honest.
The settings worth standardising across every vault and identity, with the value you want:
| Standard | Setting / control | Target value | Why |
|---|---|---|---|
| RBAC data plane | enableRbacAuthorization |
true |
Escalation-safe, scoped, PIM-capable |
| Purge protection | enablePurgeProtection |
true |
Defeats destructive delete |
| Soft-delete retention | softDeleteRetentionInDays |
90 |
Maximum recovery window |
| Network | publicNetworkAccess |
Disabled (+ PE) |
Data plane off the internet |
| Default network action | networkAcls.defaultAction |
Deny (+ trusted bypass) |
Deny-by-default with platform exceptions |
| Runtime role | data role on workload identity | Key Vault Secrets User |
Least privilege |
| Reference form | SecretUri |
versionless (…/) |
Zero-downtime rotation |
| KV ref identity | keyVaultReferenceIdentity |
the chosen UAMI | Disambiguates multi-UAMI resolution |
Security notes
- Eliminate the credential, don’t just hide it. The strongest control is that no stored secret grants access at all — federation and managed identity achieve that. Where a secret is unavoidable, it lives in exactly one vault, read via a least-privilege data role.
- Separate control and data planes. Operators who manage the vault (
Key Vault Contributor) should not automatically read secret values; data access is a separateSecrets User/Officergrant. RBAC enforces this split; access policies do not. - Least privilege and scope. Grant
Secrets Userat the secret scope where possible, not the whole vault, and never grantAdministratorto a runtime identity. Use PIM to make break-glassAdministratoreligible, not standing. - Network-isolate the data plane. Private endpoint plus
publicNetworkAccess: Disabledkeeps secret reads on the backbone. A firewall withdefaultAction: Denyand trusted-services bypass admits only the platform integrations you need. - Protect against destruction. Purge protection plus soft-delete defeats a ransomware-style destroy; diagnostic logs make every delete attributable.
- Pin and scan what pulls secrets. For AKS, pin image digests and scan images; the workload identity is only as trustworthy as the code it runs.
- Treat federation subjects as security boundaries. An over-broad subject (e.g.
pull_requestfrom any fork) is an access-grant; prefer environment-scoped subjects with approvals, and audit flexible-FIC claim expressions like any other policy. - Watch the audit log. A spike in
SecretGet, access from an unexpectedappid, or reads from outside expected IPs is an early intrusion or misconfiguration signal — alert on it.
The security controls and what each defends against:
| Control | Mechanism | Defends against | Also prevents |
|---|---|---|---|
| Federation / managed identity | FIC, IMDS | Leaked long-lived secret | Rotation outages |
| RBAC data plane | Secrets User/Officer roles |
Privilege escalation via control plane | Over-broad data access |
| Secret-scoped assignment | Role at /secrets/<name> |
One identity reading every secret | Lateral access within a vault |
| Private endpoint + firewall | publicNetworkAccess: Disabled |
Data plane exposed to the internet | Exfil from outside the VNet |
| Purge protection + soft-delete | Vault data-protection flags | Malicious/accidental destroy | Irrecoverable loss |
| Environment-scoped subjects | FIC subject + GH protection rules | Untrusted repos/forks deploying | Unapproved production deploys |
AuditEvent + alerts |
Diagnostic logs → KQL | Silent abuse | Undetected misconfiguration |
Cost & sizing
The bill for this whole pattern is dominated by operations, not capacity — which is why it is one of the cheapest security wins available.
- Key Vault is priced per operation, not per hour. A Standard vault charges per 10,000 secret operations (a few rupees per 10k); a Premium vault adds HSM-backed key operations at a higher per-op rate and a small per-key monthly charge for HSM-protected keys. Most secret-only workloads never leave Standard.
- Managed identities and FICs are free. UAMIs, system-assigned identities, and federated credentials cost nothing — the savings versus rotating and storing secrets are pure upside.
- The hidden cost is operation volume. Apps that read a secret on every request instead of caching it can generate millions of vault operations and a surprising bill; cache with a sane TTL (Key Vault references and the CSI driver do this for you).
- Private endpoints add a small hourly charge plus per-GB processing — modest, and the right trade for keeping the data plane private. Budget one private endpoint per vault per VNet that needs it.
- Log Analytics ingestion for
AuditEventis billed per GB — trivial for most vaults, but a high-traffic vault’s audit stream is worth a retention/sampling review.
A rough monthly picture and what drives each line:
| Cost driver | What you pay for | Rough INR / month | What it buys | Watch-out |
|---|---|---|---|---|
| Standard vault operations | Per 10k secret ops | ~₹20–200 (typical app) | The secret store itself | Per-request reads blow this up |
| Premium vault (HSM keys) | Higher per-op + per-key | ~₹400+ per HSM key | FIPS 140-2 L2 key protection | Only if you need HSM-backed keys |
| Managed identities / FICs | — | ₹0 | Passwordless auth | None |
| Private endpoint | Hourly + per-GB | ~₹400–900 each | Data plane off the internet | One per vault per VNet |
Log Analytics (AuditEvent) |
Per-GB ingestion | ~₹100–1,000 | Queryable audit trail | High-traffic vaults ingest more |
| Caching layer (your design) | — | ₹0 | Cuts operation count ~10–100× | Stale-vs-throttle TTL tuning |
The cache-vs-cost trade-off as a table — pick a TTL that fits the secret’s rotation cadence:
| Read pattern | Vault ops | Rotation latency | Cost | Use when |
|---|---|---|---|---|
| Per request, no cache | Very high | Instant | Highest | Almost never |
| Cache with short TTL (1–5 min) | Moderate | ≤ TTL | Low | Frequently-rotated secrets |
| KV reference / CSI poll | Low | Refresh/poll interval | Low | App Service / AKS default |
| Cache + Event Grid invalidation | Lowest | Seconds (event-driven) | Lowest | Rotation-sensitive, high-traffic |
Interview & exam questions
1. What is the secret-zero problem and how do managed identity and federation solve it? Secret-zero is the bootstrap credential: to read a secret you must authenticate, and if that authentication is itself a stored secret you have only moved the problem upstream. Managed identity (inside Azure) and workload identity federation (outside) solve it by having the platform issue a short-lived token that Entra ID trusts, so no durable credential is stored anywhere.
2. Why prefer Azure RBAC over access policies for a Key Vault data plane? Access policies are a flat per-vault list, and anyone with vaults/write (Contributor) can self-grant data access — a privilege-escalation path. RBAC separates control-plane from data-plane permissions, supports scoping to a single secret, integrates with PIM, and is the default for new vaults. A runtime workload should get Key Vault Secrets User and nothing more.
3. A GitHub Actions deploy fails with AADSTS70021. What’s wrong and how do you confirm? The OIDC token’s sub claim does not match any federated identity credential’s subject — usually the workflow sets environment: prod but the FIC subject pins a branch (or vice versa), or something was renamed. Confirm by listing FIC subjects (az ad app federated-credential list) and comparing to how the job is triggered. Fix the subject to match exactly; subjects are string-sensitive.
4. What are the three fields of a federated identity credential, and what does each match? Issuer (the OIDC issuer URL, matched against the token’s iss), subject (the exact sub claim identifying the workload — a repo+environment or a Kubernetes service account), and audience (for Entra, always api://AzureADTokenExchange). All three must match the incoming token exactly or Entra returns “no matching federated identity,” not “access denied.”
5. When do you use a system-assigned versus a user-assigned managed identity? System-assigned when the identity should live and die with one resource (a standalone service). User-assigned (UAMI) when many workloads share access or the identity must survive blue/green replacement — you grant Key Vault RBAC to the UAMI once and every workload that carries it inherits access. AKS workload identity and external federation require a UAMI (or app), since system-assigned identities cannot hold FICs.
6. An app boots but a Key Vault-backed setting is empty and it crash-loops, with no exception. What do you check? A Key Vault reference resolved to nothing because the identity isn’t enabled, lacks Key Vault Secrets User, the vault firewall blocks it, or the SecretUri is wrong. Check the Environment variables blade for a red reference error, az webapp identity show, and az role assignment list --assignee <principalId>. With multiple UAMIs attached, also set keyVaultReferenceIdentity.
7. Why might an AKS pod fail to get a token even though workload identity is enabled? Two distinct causes: the pod is missing the azure.workload.identity/use: "true" label, so the webhook never injects the env vars/token (kubectl exec … env | grep AZURE_ is empty); or the FIC subject doesn’t match the pod’s service account (env vars present but AADSTS70021). Fix the label or align the subject to system:serviceaccount:<ns>:<name>.
8. You hit “maximum allowed value of 20” creating federated credentials. What now? You hit the 20-FIC ceiling because identity was modelled per repo/branch. Don’t mint more app registrations (that scatters audit identity). Consolidate with environment-scoped subjects, or a flexible federated credential matching repository_owner on an app registration, or one identity per deployment tier — model the trust boundary, not the repository.
9. How do you rotate a secret with zero downtime? Reference it versionlessly everywhere (a SecretUri ending in /), store it in exactly one vault, and let consumers follow the current version: App Service KV references re-resolve on restart/refresh, the CSI driver polls at rotation-poll-interval, and SecretNewVersionCreated via Event Grid can trigger immediate invalidation. Never pin a version or hardcode the value — that reintroduces a coordinated-deploy outage.
10. What does purge protection do, and what’s the catch? It blocks even a privileged actor from hard-deleting a vault or secret before the soft-delete retention window elapses, defeating a ransomware-style destroy. The catch: it is irreversible once enabled, and it blocks redeploys that try to recreate the same vault name within retention — so name vaults deliberately and don’t enable it in throwaway environments you recreate often.
11. How do you keep a Key Vault’s data plane off the internet without breaking App Service references? Set publicNetworkAccess: Disabled and add a private endpoint with the privatelink.vaultcore.azure.net DNS zone linked to the VNet, and enable the firewall’s trusted Azure services bypass so platform integrations (App Service KV references) still resolve. Land that private path before disabling public access, or you lock out your own pipelines.
12. What’s the difference between a control-plane 403 and a data-plane Forbidden on Key Vault? A control-plane 403 (e.g. on vaults/write) means the caller lacks an RBAC management role like Key Vault Contributor. A data-plane Forbidden (on secrets/getValue) means it lacks a data role/access policy like Key Vault Secrets User. They are governed by different planes — granting the wrong one is the classic mistake.
These map to AZ-500 (Security Engineer) — manage Key Vault, secrets, keys, certificates; configure managed identities; workload identity — and AZ-204 (Developer) — secure app configuration data, implement managed identities and Key Vault references. The federation and AKS angles touch AZ-400 and the Kubernetes specialty. A compact cert-mapping for revision:
| Question theme | Primary cert | Exam objective area |
|---|---|---|
| Key Vault RBAC vs access policies | AZ-500 | Secure data and applications; Key Vault |
| Managed identity (system vs user) | AZ-500 / AZ-204 | Implement and manage identities for resources |
| FIC fields, GitHub OIDC | AZ-400 / AZ-500 | Secure pipelines; workload identity federation |
| AKS workload identity | AKS specialty / AZ-500 | Secure Kubernetes workloads |
| KV references, rotation | AZ-204 | Secure app configuration data |
| Networking, private endpoint | AZ-500 / AZ-700 | Secure the data plane; private connectivity |
Quick check
- To read a secret from Key Vault a workload must authenticate to Entra ID. What is the name of the problem where that authentication itself needs a stored credential, and what mechanism removes it?
- A GitHub Actions job sets
environment: prodbutazure/loginfailsAADSTS70021. Where is the mismatch, and what one command shows you the configured value to compare against? - True or false: granting a runtime web app
Key Vault Contributoris the correct least-privilege way to let it read a secret. - Your AKS pod has none of the
AZURE_*environment variables. What single piece of Kubernetes YAML is almost certainly missing? - You rotated a secret in the vault but the App Service still uses the old value. Name the most likely cause in the reference URI.
Answers
- The secret-zero problem. It is removed by platform-issued identity — a managed identity (inside Azure, via IMDS) or workload identity federation (outside, via an OIDC token Entra trusts) — so no durable credential is stored.
- The FIC subject does not match the token’s
sub. The job’ssubisrepo:ORG/REPO:environment:prod, so the FIC subject must be exactly that. Confirm the configured value withaz ad app federated-credential list --id <appId> --query "[].subject". - False.
Key Vault Contributoris a control-plane role (manage the vault) and lets the identity self-grant more access. The least-privilege data-plane role to read secret values isKey Vault Secrets User. - The pod (template) label
azure.workload.identity/use: "true". Without it the mutating webhook does not inject the projected token or theAZURE_*env vars, so the SDK has nothing to exchange. - A pinned secret version in the
SecretUri(a version segment after…/secrets/<name>/). A pinned version never refreshes; use a versionless URI ending in/so the current version is resolved.
Glossary
- Key Vault — Azure’s managed store for secrets, keys, and certificates, with a control plane (manage the vault) and a data plane (read/write values).
- Secret-zero — the bootstrap credential you would need to authenticate in order to read a secret; the problem managed identity and federation eliminate.
- Managed identity — a platform-minted, platform-rotated identity for an Azure resource, exposed via IMDS; needs no stored secret.
- System-assigned identity — a managed identity whose lifecycle is tied 1:1 to a single resource (created and deleted with it).
- User-assigned managed identity (UAMI) — a standalone, reusable managed-identity resource attached to many workloads; can hold federated credentials.
- IMDS — the Instance Metadata Service (
169.254.169.254) that issues an in-Azure workload its managed-identity token. - Workload identity federation — configuring Entra ID to trust an external OIDC issuer’s token in exchange for an Entra access token, with no stored secret.
- Federated identity credential (FIC) — the trust assertion (issuer, subject, audience) on an app or UAMI that Entra matches against an incoming OIDC token.
- Issuer / subject / audience — the three FIC fields: the OIDC issuer URL, the exact
subclaim of the workload, andapi://AzureADTokenExchangefor Entra. - Access policy — the legacy, escalation-prone, per-vault data-plane permission model; superseded by RBAC.
- RBAC data role —
Key Vault Secrets User/Officer/Administrator(and crypto/cert equivalents) granting least-privilege data-plane access. - Key Vault reference — an app setting of the form
@Microsoft.KeyVault(SecretUri=…)resolved at boot by the app’s managed identity. keyVaultReferenceIdentity— the site setting that tells App Service which attached identity resolves Key Vault references when more than one is present.- CSI Secrets Store — the Kubernetes provider that mounts Key Vault secrets as files in a pod, working in workload-identity mode.
- Versionless URI — a
SecretUriending in/(no version segment) that resolves the current version, enabling zero-downtime rotation. - Purge protection — an irreversible vault setting blocking hard-delete before the soft-delete retention window elapses.
- Soft-delete — always-on recovery of a deleted vault/secret within a retention window (7–90 days).
- Flexible federated credential — a FIC on an app registration that matches token claims with expressions/wildcards (e.g.
repository_owner), beyond exact-subject matching.
Next steps
You can now stand up a secret-free path end to end and diagnose where a passwordless flow breaks. Build outward:
- Next: Azure Key Vault: Secrets, Keys & Certificates — go deep on the data-plane objects you are protecting and their lifecycle.
- Related: Azure Key Vault Secret Rotation with Managed Identity — automate the rotation half with managed identity and Event Grid.
- Related: Entra Managed Identities Deep Dive: User-Assigned, FIC & RBAC — the identity objects and federated credentials in full.
- Related: AKS Secrets Store CSI: Key Vault Sync & Rotation — the file-mount path for cluster workloads, with rotation.
- Related: GitHub Actions OIDC: Keyless Deploys to Multi-Cloud — generalise federation across Azure, AWS, and GCP from one workflow.
- Related: Azure Private Endpoints & Private DNS at Scale — get the vault’s private network path right so references never resolve to a public IP.