Secrets Store CSI Driver on AKS: Mounting Key Vault Secrets with Rotation and K8s Sync

The Secrets Store CSI Driver lets a pod mount Azure Key Vault secrets, keys, and certificates as files on a tmpfs volume, with no secret material written to etcd by default. On AKS it ships as a first-party add-on (azure-keyvault-secrets-provider): a managed DaemonSet, an Azure provider plugin, and platform-owned lifecycle rather than a Helm chart you babysit. You annotate a service account, write a SecretProviderClass, mount a CSI volume, and Key Vault material appears at /mnt/secrets-store inside the container — passwordless, auditable, and rotatable.

The interesting engineering is not the mount; it is the identity backing the mount and the rotation semantics once secrets change underneath running pods. Three facts trip up almost everyone the first time. The driver only fetches secrets when a pod mounts a volume — no pod, no Secret. The synced Kubernetes Secret is created on first mount and garbage-collected when the last consuming pod dies. And rotation reaches a file or a mounted Secret volume on the next poll, but an environment variable injected at container start never changes, so an app reading config from env keeps the old value forever unless you force a restart. Miss any of these and you ship an incident.

This walkthrough wires the add-on to Microsoft Entra Workload ID (federated, no client secrets), authors a SecretProviderClass, syncs the mounted objects into a native Kubernetes Secret for env-var consumption, terminates TLS at ingress from a Key Vault certificate, and turns on auto-rotation — with a clear-eyed view of what actually propagates and what does not. By the end you will read the symptom (ContainerCreating forever, AADSTS70021, an empty synced Secret, a rotated value that never reaches the app) and name the exact cause and fix in under a minute. Throughout, I prefer workload identity over the add-on’s auto-created managed identity: the add-on identity is a node-scoped, cluster-wide credential, while workload identity scopes access to a single service account — what you want for least privilege and clean auditing. The mechanics here build directly on Azure Key Vault with Workload Identity for secretless secrets and the broader Kubernetes ConfigMaps and Secrets deep dive.

What problem this solves

The naïve way to get a database password into a pod is a Kubernetes Secret you kubectl create from a literal, or worse, a value baked into a Helm values.yaml checked into git. Both put plaintext (base64 is not encryption) in etcd and in your version control, readable by anyone with get secret RBAC or repo access. There is no rotation story, no audit trail of which workload read which secret when, and no separation between the team that owns the secret and the team that runs the cluster. When an auditor asks “prove that pod A cannot read pod B’s database credential,” you have no answer.

The Secrets Store CSI Driver moves the source of truth to Key Vault and the identity to Entra. The secret lives in a vault with its own RBAC, versioning, soft-delete, purge protection, and diagnostic logs that record every SecretGet with the calling identity. The pod authenticates with a federated token — no client secret is ever issued or stored. Mounted material lands on tmpfs (memory-backed), never on disk, never in etcd unless you explicitly opt into the K8s sync. Rotation happens in the vault and the driver pulls the new version on a poll. The blast radius of a compromised pod shrinks from “the entire secret store” to “exactly the secrets that one service account was granted.”

Who hits the pain this solves: any team running multi-tenant AKS where squads share a cluster but must not share secrets; anyone failing a PCI/SOC2/ISO audit on secret attribution; teams whose “rotation” today means a manual kubectl edit secret and a prayer; and anyone who has shipped a credential to a public registry or git history. The cost of getting it wrong is concrete — a single shared identity means one popped pod reads everything, and your audit log attributes nothing.

To frame the field before the deep dive, here is every failure class this article covers, where it bites on the path, and the one place to look first:

Symptom class	What you actually see	First question to ask	First place to look	Most common single cause
Pod stuck `ContainerCreating`	Pod never reaches `Running`	Did the volume mount fail?	`kubectl describe pod` events	Federation subject mismatch or missing RBAC
`AADSTS70021`	Event: “No matching federated identity record found”	Does the FIC subject match the SA?	`az identity federated-credential list`	`--subject` typo vs `system:serviceaccount:<ns>:<sa>`
`403 Forbidden` from Key Vault	Event: “does not have secrets get permission”	Is the RBAC role correct for the object type?	`az role assignment list --scope <kv>`	Wrong role (Secrets vs Certificate User)
Empty synced Secret	`kubectl get secret` exists but `data: {}`	Does `objectName` match the filename?	Compare SPC `objectName` to mount `ls`	Wired to KV name, not the mounted alias
Rotated value never reaches app	New vault value, old app behaviour	How does the app read the secret?	Consumption pattern (file vs env)	Env var snapshot at container start
Synced Secret vanished	Ingress 503, TLS gone	Did the last consumer pod die?	`kubectl get pods` referencing the SPC	GC when last mounting pod deleted

Learning objectives

By the end of this article you can:

Enable the azure-keyvault-secrets-provider add-on and choose deliberately between the auto-created managed identity and Workload ID, knowing the security trade-off of each.
Federate a user-assigned managed identity to a Kubernetes service account with a correct --subject, and explain why a typo there surfaces as AADSTS70021 at pod start, not at apply time.
Assign the correct Key Vault RBAC role for each object type (secret, key, cert) and explain why key and cert both need Certificate User.
Author a SecretProviderClass whose nested objects block, objectAlias, and secretObjects mapping are all internally consistent — and debug the empty-synced-Secret trap.
Enable auto-rotation, tune the poll interval, and predict precisely which consumption patterns (file, mounted Secret volume, env var) pick up a rotation and which need a pod restart.
Terminate TLS at an ingress controller from a Key Vault certificate that rotates, and keep the synced kubernetes.io/tls Secret alive against garbage collection.
Drive the verification and observability surface: mount listing, synced-Secret inspection, add-on config readout, and the driver’s Prometheus rotation/sync metrics.
Run the symptom→cause→confirm→fix playbook for every common failure mode without guessing.

Prerequisites & where this fits

You should be comfortable with AKS fundamentals — a cluster, kubectl, namespaces, and the difference between a Deployment and a Pod. You should understand what a Kubernetes Secret is and that base64 is encoding, not encryption (the ConfigMaps and Secrets deep dive covers this). You should know what a managed identity is at a basic level — see Entra managed identities deep dive — and have a Key Vault you can grant RBAC on. Familiarity with Kubernetes RBAC and service accounts makes the federation step click.

This sits in the Security / platform layer of an AKS deep-dive track. Upstream of it: Azure Key Vault secrets, keys, and certificates (what is in the vault) and Key Vault with Workload Identity (the auth pattern). Adjacent: Azure Key Vault secret rotation with managed identity (the vault-side rotation event this article consumes) and Kubernetes ingress controllers, TLS and routing (where the TLS cert pattern lands). When mounts fail in ways the add-on cannot explain, the general Kubernetes troubleshooting methodology is your fallback.

A quick map of who owns what during an incident, so you call the right person fast:

Layer	What lives here	Who usually owns it	Failure classes it can cause
Key Vault (data plane)	Secrets, keys, certs, RBAC, firewall	Security / platform	403, object-not-found, firewall block
Entra (identity)	UAMI, federated credential, tenant	Identity team	`AADSTS70021`, `AADSTS700016`
AKS control plane	OIDC issuer, workload-identity webhook	Platform / SRE	Token not projected, webhook off
CSI driver + provider	DaemonSet in `kube-system`, the mount	Platform (add-on)	Mount hang, provider crash, rotation lag
`SecretProviderClass` (CRD)	Namespaced config, objects, sync map	App team	Empty Secret, wrong alias, parse error
Pod / Deployment	SA name, label, volume, env wiring	App / dev team	Missing label, wrong SA, env staleness

Core concepts

Five mental models make every later diagnosis obvious.

The mount is pull-based and identity-gated. The driver does nothing until a pod mounts a CSI volume that references a SecretProviderClass. At mount time the Azure provider exchanges the pod’s projected service-account token for an Entra token (via the federated credential), calls the Key Vault data plane, and writes the returned objects to a tmpfs volume. No pod means no fetch; a bad identity means the mount fails and the pod never starts. This is the single most important behavioural fact: the secret’s availability is coupled to a running pod and a working identity.

Workload Identity is federation, not a stored secret. A user-assigned managed identity (UAMI) is bound to a Kubernetes service account by a federated identity credential (FIC) whose subject is system:serviceaccount:<namespace>:<name> and whose issuer is the cluster’s OIDC issuer URL. When a labelled pod runs, the webhook projects a short-lived OIDC token; Entra trusts it because the FIC says “this issuer + this subject = this identity.” No client secret exists anywhere. The audience is the fixed string api://AzureADTokenExchange.

The Key Vault object model is not the intuitive crypto/secret split. A Key Vault certificate is internally backed by both a key and a secret. The provider retrieves key and cert object types through the certificate path, so both require Key Vault Certificate User — not “Key”, which is for crypto operations the provider never performs. Mounting a full PEM chain (cert + private key) uses objectType: secret against the certificate, which needs Key Vault Secrets User. Getting this wrong yields a 403 at mount, not at apply.

The synced Kubernetes Secret has a driver-owned lifecycle. The secretObjects block mirrors mounted files into a real Secret, but that Secret is created on first mount (not on SPC apply) and garbage-collected when the last consuming pod is deleted. Anything that reads the Secret before the first pod (a Helm pre-install hook, an unrelated Deployment) gets not-found; anything that depends on it persisting after the pods are gone (an ingress controller) breaks when it vanishes. The driver owns it; do not treat it as an independent object.

Rotation lag is real and consumption-dependent. With rotation on, the driver polls every interval (default 2 minutes) and, on a detected change, updates both the mounted file and the synced Secret. A file consumer sees the new value on next read; a Secret-as-volume consumer sees it in place; a Secret-as-env-var consumer sees nothing because env is a point-in-time snapshot at container start. Worst-case lag is roughly one poll interval plus the kubelet’s atomic-write sync — design for “live within a poll interval,” never sub-second.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary repeats these for lookup; this is the mental model side by side:

Concept	One-line definition	Where it lives	Why it matters here
Add-on	`azure-keyvault-secrets-provider`	AKS managed	Installs driver + Azure provider
CSI driver	Generic Secrets Store CSI DaemonSet	`kube-system`	Mounts the volume, runs rotation poll
Azure provider	Plugin that talks to Key Vault	`kube-system`	Auth + fetch from the vault
`SecretProviderClass`	Namespaced CRD describing what to fetch	Per namespace	The config you author
UAMI	User-assigned managed identity	Resource group	The identity that reads the vault
Federated credential (FIC)	Trust binding SA ↔ UAMI	On the UAMI	No client secret; subject must match
OIDC issuer	Cluster token issuer URL	AKS control plane	The `issuer` in the FIC
Workload-identity webhook	Injects token + `AZURE_*` env	AKS (add-on)	Triggered by the pod label
`secretObjects`	Block that syncs mount → K8s Secret	In the SPC	Optional; needed for env vars
`objectAlias`	Renames the mounted file	In the SPC `objects`	Sync keys off this filename
Poll interval	How often rotation checks the vault	Add-on config	Default 2m; trade freshness vs API load
tmpfs	In-memory mount backing	Node	Why secrets never hit disk/etcd

Enable the add-on and choose your identity model

The add-on installs the Secrets Store CSI Driver plus the Azure provider. On an existing cluster:

export RESOURCE_GROUP=rg-platform
export CLUSTER_NAME=aks-platform

az aks enable-addons \
  --addons azure-keyvault-secrets-provider \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP

Enabling the add-on always creates a user-assigned managed identity named azurekeyvaultsecretsprovider-<cluster> in the node resource group (MC_...) and assigns it to the node VMSS. You cannot prevent its creation, but you do not have to use it. There are exactly two ways to authenticate to Key Vault, and the choice is a security decision, not a convenience one:

Model	Credential scope	Audit granularity	Best for	Blast radius if pod compromised
Add-on managed identity	Node-level, shared by every pod	One identity for all reads — unattributable	Quick demos, single-tenant clusters	Every secret the identity can read
Workload ID (recommended)	Per Kubernetes service account	Per-SA attribution in vault logs	Multi-team clusters, least privilege, PCI/SOC2	Only that SA’s granted secrets

Workload ID requires the OIDC issuer and the workload-identity webhook. If you create the cluster fresh, include both flags:

az aks create \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --enable-addons azure-keyvault-secrets-provider \
  --enable-oidc-issuer \
  --enable-workload-identity \
  --generate-ssh-keys

On an existing cluster, enable them in place — idempotent, a control-plane reconcile, pods keep running:

az aks update \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --enable-oidc-issuer \
  --enable-workload-identity

The same in Bicep, for the cluster resource:

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: clusterName
  location: location
  identity: { type: 'SystemAssigned' }
  properties: {
    oidcIssuerProfile: { enabled: true }
    securityProfile: {
      workloadIdentity: { enabled: true }
    }
    addonProfiles: {
      azureKeyvaultSecretsProvider: {
        enabled: true
        config: {
          enableSecretRotation: 'true'
          rotationPollInterval: '2m'
        }
      }
    }
  }
}

Every flag in this group, what it does, and the cost of getting it wrong:

Flag / setting	What it enables	Default	Required for Workload ID?	If omitted
`--enable-addons azure-keyvault-secrets-provider`	Driver + Azure provider DaemonSets	off	Yes	No mount capability at all
`--enable-oidc-issuer`	Cluster OIDC issuer URL	off	Yes	No issuer to put in the FIC
`--enable-workload-identity`	Mutating webhook for token projection	off	Yes	Token never projected; auth fails
`enableSecretRotation` (config)	Rotation poll loop	`false`	No	Secrets fetched once, never refreshed
`rotationPollInterval` (config)	Poll cadence	`2m`	No	n/a (only with rotation on)

Confirm the add-on landed and grab the auto-created identity details (useful for inventory even if you do not use it):

az aks show \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --query addonProfiles.azureKeyvaultSecretsProvider

{
  "config": { "enableSecretRotation": "false", "rotationPollInterval": "2m" },
  "enabled": true,
  "identity": {
    "clientId": "00001111-aaaa-2222-bbbb-3333cccc4444",
    "objectId": "aaaaaaaa-0000-1111-2222-bbbbbbbbbbbb",
    "resourceId": ".../userAssignedIdentities/azurekeyvaultsecretsprovider-aksplatform"
  }
}

The driver and provider run as DaemonSets in kube-system:

kubectl get pods -n kube-system \
  -l 'app in (secrets-store-csi-driver,secrets-store-provider-azure)' -o wide

You should see aks-secrets-store-csi-driver-* (3/3 containers) and aks-secrets-store-provider-azure-* (1/1), one of each per node. What each managed component is and how to sanity-check it:

Component	Kind	Namespace	Healthy signal	If unhealthy
`aks-secrets-store-csi-driver`	DaemonSet	`kube-system`	3/3 per node, `Running`	Mounts hang cluster-wide
`aks-secrets-store-provider-azure`	DaemonSet	`kube-system`	1/1 per node, `Running`	Azure auth/fetch fails
`secretproviderclasses.secrets-store.csi.x-k8s.io`	CRD	cluster	`kubectl get crd` lists it	SPC apply errors
`secretproviderclasspodstatuses`	CRD	cluster	per-pod fetch status object	No status → fetch never ran
Add-on UAMI	Identity	`MC_*` RG	exists, VMSS-assigned	Add-on identity auth fails

Federate a managed identity to a service account

The heart of the passwordless model: create a user-assigned identity, grant it data-plane RBAC on the vault, then bind it to a service account via a federated credential. No client secret is ever issued.

export UAMI=id-app-secrets
export KEYVAULT_NAME=kv-platform-prod
export SA_NAME=app-sa
export SA_NAMESPACE=payments

# 1. Create the workload identity
az identity create --name $UAMI --resource-group $RESOURCE_GROUP

export USER_ASSIGNED_CLIENT_ID=$(az identity show \
  --resource-group $RESOURCE_GROUP --name $UAMI --query clientId -o tsv)
export IDENTITY_TENANT=$(az aks show \
  --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP \
  --query identity.tenantId -o tsv)

The RBAC role depends on the object type, not the crypto/secret intuition

With an RBAC-enabled vault, the role you assign depends on what you mount — and this trips people up constantly because it does not follow the obvious split:

Object type in `SecretProviderClass`	Required built-in role	What you get back	Why this role
`secret`	Key Vault Secrets User	The secret value (or full PEM chain if against a cert)	Secret data-plane read
`key`	Key Vault Certificate User	The public key (PEM)	Retrieved via the cert path
`cert`	Key Vault Certificate User	The certificate only (PEM, no chain)	Retrieved via the cert path

Per the AKS docs, both key and cert object types require Key Vault Certificate User — the provider retrieves them through the certificate path, not a crypto operation. Because a Key Vault certificate is internally backed by both a key and a secret, mounting a full cert chain via objectType: secret against a certificate needs Key Vault Secrets User. Assign only what you mount:

export KEYVAULT_SCOPE=$(az keyvault show --name $KEYVAULT_NAME --query id -o tsv)

az role assignment create \
  --role "Key Vault Secrets User" \
  --assignee $USER_ASSIGNED_CLIENT_ID \
  --scope $KEYVAULT_SCOPE

A reference of the relevant Key Vault data-plane roles, so you grant the minimum and never reach for an admin role:

Role	Data-plane permissions	Use for	Do NOT use when
Key Vault Secrets User	get/list secrets	`objectType: secret`, full PEM chains	You only need cert public parts
Key Vault Certificate User	get/list certificates	`objectType: key`, `objectType: cert`	You mount the private key chain
Key Vault Crypto User	wrap/unwrap, sign/verify	Crypto ops (not this provider)	Mounting via CSI — never needed
Key Vault Secrets Officer	full secret CRUD	Pipelines that write secrets	A read-only mount identity
Key Vault Administrator	full data-plane admin	Break-glass / setup	Any workload identity (over-privileged)

RBAC vs the legacy access-policy model — know which your vault uses, because the wrong one silently grants nothing:

Authorization mode	How access is granted	How to grant the mount identity	Detect with
Azure RBAC (recommended)	`az role assignment create` at vault/secret scope	The role table above	`enableRbacAuthorization: true` on the vault
Vault access policy (legacy)	`az keyvault set-policy --secret-permissions get`	Per-identity policy, get/list	`enableRbacAuthorization: false`
Mixed (migration)	Both can be present	Match whichever the vault enforces	Vault property + a test mount

Bind the federated credential

Get the cluster’s OIDC issuer URL, create the service account annotated with the identity’s client ID, then bind the federated credential to the system:serviceaccount:<ns>:<name> subject:

export AKS_OIDC_ISSUER=$(az aks show \
  --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME \
  --query oidcIssuerProfile.issuerUrl -o tsv)

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ${SA_NAME}
  namespace: ${SA_NAMESPACE}
  annotations:
    azure.workload.identity/client-id: ${USER_ASSIGNED_CLIENT_ID}
EOF

az identity federated-credential create \
  --name fic-app-secrets \
  --identity-name $UAMI \
  --resource-group $RESOURCE_GROUP \
  --issuer ${AKS_OIDC_ISSUER} \
  --subject system:serviceaccount:${SA_NAMESPACE}:${SA_NAME} \
  --audience api://AzureADTokenExchange

The --subject must match the namespace and service account name exactly. A typo here produces the single most common failure mode — AADSTS70021: No matching federated identity record found — at pod startup, not at apply time. Every field in the FIC, what it must equal, and the error if it is wrong:

FIC field	Must equal	Source of truth	Error if wrong
`issuer`	Cluster OIDC issuer URL	`az aks show ... oidcIssuerProfile.issuerUrl`	`AADSTS70021` (no match)
`subject`	`system:serviceaccount:<ns>:<sa>`	The pod’s namespace + SA name	`AADSTS70021` (no match)
`audience`	`api://AzureADTokenExchange`	Fixed constant	`AADSTS700016` / invalid audience
(SA annotation) `client-id`	UAMI `clientId`	`az identity show ... clientId`	Token requested for wrong identity
(Pod) label `use: "true"`	Literal string	Required for webhook	Token never projected at all

Common subject-mismatch shapes — eyeball this when AADSTS70021 appears:

What you wrote	Why it fails	Correct form
`system:serviceaccount:payments`	Missing the SA name segment	`system:serviceaccount:payments:app-sa`
`system:serviceaccount:Payments:app-sa`	Namespace is case-sensitive	`system:serviceaccount:payments:app-sa`
`system:serviceaccount:payments:app_sa`	SA name typo (`_` vs `-`)	`system:serviceaccount:payments:app-sa`
`serviceaccount:payments:app-sa`	Missing `system:` prefix	`system:serviceaccount:payments:app-sa`
Right subject, pod in `default` ns	Pod ran in a different namespace	Run pod in `payments`, or add a 2nd FIC

Author the SecretProviderClass

The SecretProviderClass (SPC) is a namespaced CRD that tells the Azure provider which vault to hit, how to authenticate, and which objects to fetch. For workload identity, the two load-bearing settings are usePodIdentity: "false" and clientID set to the workload identity’s client ID — that combination signals the provider to use the projected service-account token.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: app-kv
  namespace: payments
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    clientID: "00001111-aaaa-2222-bbbb-3333cccc4444"  # workload identity clientId
    keyvaultName: "kv-platform-prod"
    cloudName: ""                                      # defaults to AzurePublicCloud
    tenantId: "ffffffff-1111-2222-3333-444444444444"
    objects: |
      array:
        - |
          objectName: db-connection-string
          objectType: secret
          objectVersion: ""        # empty = latest
        - |
          objectName: signing-key
          objectType: key
          objectVersion: ""
        - |
          objectName: tls-app
          objectType: secret       # full PEM chain + private key
          objectAlias: tls-app.pem

Every parameter, end to end

The full parameters surface for the Azure provider — what each does, its default, and the gotcha:

Parameter	Purpose	Default	When to change	Gotcha
`usePodIdentity`	Selects pod-identity (deprecated) path	`"false"`	Never for Workload ID	Must be the string `"false"`, not bool
`useVMManagedIdentity`	Use node/add-on MI instead of WI	`"false"`	Add-on-identity model only	Mutually exclusive with `clientID` WI
`clientID`	Workload identity client ID	—	Always for Workload ID	Must match the SA annotation
`keyvaultName`	Vault short name (not URL)	—	Per vault	Name only, e.g. `kv-platform-prod`
`cloudName`	Azure cloud	`""` → Public	Gov / China clouds	Leave empty for commercial Azure
`tenantId`	Entra tenant GUID	—	Always	Cross-tenant needs the resource tenant
`objects`	Nested YAML of objects to fetch	—	Always	A YAML string; `\|` and `- \|` required

The nested objects document is the part that breaks most first attempts. Each item’s fields:

`objects` item field	Purpose	Values	Empty / default behaviour
`objectName`	Key Vault object name	the vault object’s name	required
`objectType`	Which object model path	`secret` \| `key` \| `cert`	required
`objectVersion`	Pin a specific version	a version GUID or `""`	`""` = latest (do this for rotation)
`objectAlias`	Rename the mounted file	any filename	defaults to `objectName`
`objectEncoding`	Decode base64 secrets	`utf-8` \| `base64` \| `hex`	`utf-8`; use `base64` for binary
`objectFormat`	PEM vs PFX for certs	`pem` \| `pfx`	`pem`

A few things worth internalizing:

objects is a YAML string containing another YAML document. The | block scalar and the - | per-item delimiters are required; the provider parses this nested document itself. Drop a - | and you get a parse error or a silently-skipped object.
objectVersion: "" resolves to the latest version. Leave it empty unless you must freeze a version — pinning defeats rotation.
objectAlias controls the filename on the mount. Without it, the file is named after objectName. This matters in the sync step, because sync keys off the mounted filename.

For certificates, recall the Key Vault object model — the three object types return materially different things:

`objectType`	Returns	PEM contents	Typical consumer
`key`	Public key only	public key block	JWT signature verification
`cert`	Certificate only	certificate block, no chain, no key	Client-cert pinning, display
`secret` (vs a cert)	Private key + full cert chain	key block + cert chain	Ingress TLS termination

Ingress controllers want the last one. The version-pinning decision, spelled out:

`objectVersion` value	Behaviour	Rotation picks up?	Use when
`""` (empty)	Always latest enabled version	Yes	Default — you want rotation
explicit version GUID	Frozen to that version	No	Compliance freeze, repro of an incident
disabled latest version	Mount fails until re-enabled	n/a	Never intentionally

Mount as a volume and the startup coupling

The driver only fetches secrets when a pod mounts a CSI volume referencing the SPC — no pod, no secret, by design. The sharp consequence: the volume must mount successfully for the pod to start. If the identity is misconfigured or the object does not exist, the pod stays in ContainerCreating and you read the reason from events.

apiVersion: v1
kind: Pod
metadata:
  name: payments-api
  namespace: payments
  labels:
    azure.workload.identity/use: "true"   # required: opt the pod into the webhook
spec:
  serviceAccountName: app-sa              # must match the federated subject
  containers:
    - name: api
      image: ghcr.io/acme/payments-api:1.8.2
      volumeMounts:
        - name: kv
          mountPath: /mnt/secrets-store
          readOnly: true
  volumes:
    - name: kv
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: "app-kv"

Two non-negotiables here: the pod label azure.workload.identity/use: "true" (this makes the webhook inject the projected token and the AZURE_* env vars) and serviceAccountName matching the federated subject. Miss the label and the token is never projected; the provider fails with an auth error. The required volume/pod fields and what each one is for:

Field	Required value	Purpose	If wrong/missing
`labels.azure.workload.identity/use`	`"true"`	Opts pod into the webhook	No token projected → auth error
`serviceAccountName`	the federated SA	Identity the FIC trusts	`AADSTS70021` at mount
`volumes[].csi.driver`	`secrets-store.csi.k8s.io`	Selects the CSI driver	Volume not recognized
`volumeAttributes.secretProviderClass`	the SPC name	Which SPC to use	SPC not found, mount fails
`volumeMounts[].readOnly`	`true`	Secrets are read-only	Write attempts fail
`volumeMounts[].mountPath`	e.g. `/mnt/secrets-store`	Where files appear	App looks in the wrong place

The secrets land as files:

kubectl exec -n payments payments-api -- ls /mnt/secrets-store/
# db-connection-string  signing-key  tls-app.pem

If the pod is stuck, the events tell you exactly which gate failed — and these strings map one-to-one to a cause:

Event substring (`kubectl describe pod`)	Root cause	Fix
`failed to get key vault token` / `no matching federated identity`	FIC subject/issuer mismatch	Fix `--subject`, re-create the FIC
`does not have secrets get permission`	Missing/incorrect Key Vault RBAC	Assign the right role at vault scope
`Secret not found` / `SecretNotFound`	`objectName` not in the vault	Correct the name or create the object
`MountVolume.SetUp failed` (provider)	Provider DaemonSet unhealthy	Check `aks-secrets-store-provider-azure`
`client-id ... not found`	SA annotation client-id wrong	Match annotation to UAMI `clientId`
Pod `Running` but no token env	Missing the `use: "true"` label	Add the label, recreate the pod

Sync mounted objects into a native Kubernetes Secret

Files on a volume suit apps that read from disk, but most apps want env vars, and env vars come from a Kubernetes Secret. The driver mirrors mounted content into a real Secret via the secretObjects block. Add it to the same SPC:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: app-kv
  namespace: payments
spec:
  provider: azure
  secretObjects:
    - secretName: app-db
      type: Opaque
      data:
        - objectName: db-connection-string   # must match the mounted FILENAME
          key: DB_CONNECTION_STRING           # key inside the K8s Secret
  parameters:
    # ... unchanged from the SPC above ...

The secretObjects schema, field by field:

`secretObjects` field	Purpose	Values	Gotcha
`secretName`	Name of the K8s Secret created	any DNS-1123 name	Must be unique per namespace
`type`	Kubernetes Secret type	`Opaque` \| `kubernetes.io/tls` \| `kubernetes.io/dockerconfigjson`	TLS type needs both `tls.crt`+`tls.key`
`data[].objectName`	The mounted filename	alias if set, else `objectName`	NOT the Key Vault name — the #1 trap
`data[].key`	Key inside the K8s Secret	any key name	This is what `secretKeyRef` references
`labels` / `annotations`	Metadata on the Secret	maps	Useful for Reloader auto-discovery

The critical rule: objectName under secretObjects.data must match the mounted filename, which is objectAlias if you set one, otherwise objectName. People wire this to the Key Vault object name and get an empty Secret. The exact mapping chain, so you can trace it end to end:

Stage	Value in the example	Determined by
Key Vault object	`tls-app`	The vault
`objects` item `objectName`	`tls-app`	Your SPC
`objects` item `objectAlias`	`tls-app.pem`	Your SPC (renames the file)
Mounted filename	`tls-app.pem`	The alias (or objectName if no alias)
`secretObjects.data.objectName`	`tls-app.pem`	Must equal the mounted filename
`secretObjects.data.key`	`tls.crt`	The K8s Secret key your app reads

Now consume it as an env var:

      env:
        - name: DB_CONNECTION_STRING
          valueFrom:
            secretKeyRef:
              name: app-db
              key: DB_CONNECTION_STRING

Two lifecycle facts that are easy to miss and cause incidents:

The synced Secret only exists after at least one pod mounts the volume. It is created on first mount, not on SPC apply. Anything that reads the Secret before that first pod (a Helm pre-install hook, another Deployment) gets a not-found.
The synced Secret is garbage-collected when the last consuming pod is deleted. The driver owns its lifecycle. Do not point unrelated workloads at it expecting it to persist independently.

The supported synced-Secret types and when to reach for each:

Synced Secret `type`	Required keys	Produced from	Consumer
`Opaque`	any	one secret per key	env vars, generic file config
`kubernetes.io/tls`	`tls.crt`, `tls.key`	`objectType: secret` against a cert	Ingress TLS
`kubernetes.io/dockerconfigjson`	`.dockerconfigjson`	a secret holding registry JSON	`imagePullSecrets`
`kubernetes.io/basic-auth`	`username`, `password`	two secrets	basic-auth middleware

Enable auto-rotation, tune the poll interval, and understand propagation

Rotation is off by default. It is an add-on-level setting, not an SPC field. Enable it and optionally widen the poll interval:

az aks addon update \
  --resource-group $RESOURCE_GROUP \
  --name $CLUSTER_NAME \
  --addon azure-keyvault-secrets-provider \
  --enable-secret-rotation \
  --rotation-poll-interval 5m

The default poll interval is 2 minutes. The driver polls every interval, and on a detected change updates both the mounted file content and the synced Kubernetes Secret. Now the part nobody reads carefully — what actually reaches the application:

Consumption pattern	Picks up rotation automatically?	What you must do	Lag
App reads the mounted file	Yes, on next poll	App must re-read the file (watch or re-open per request)	≤ 1 poll + sync
App reads the synced Secret as a volume	Yes, on next poll	Mounted Secret volume contents update in place	≤ 1 poll + kubelet sync
App reads the synced Secret as an env var	No	Restart the pod (env is a start-time snapshot)	until restart
App reads via `subPath` mount	No	Remove `subPath`; mount the whole volume	never (kubelet limitation)

That env-var row is the trap. Environment variables are a point-in-time snapshot taken at container start. Rotating the secret updates the Kubernetes Secret, but a running container’s env block never changes. To close the loop, run something like Reloader, which watches synced Secrets and triggers a rolling restart:

metadata:
  annotations:
    reloader.stakater.com/auto: "true"

There is also a known Kubernetes limitation orthogonal to all of this: a Secret or ConfigMap mounted via subPath does not receive updates — that is a kubelet behavior, not a driver bug. Mount the whole volume, not a subPath, if you want in-place updates.

The poll-interval trade-off — faster polling is fresher but hits the Key Vault data plane more often:

Poll interval	Freshness (worst-case lag)	Key Vault API pressure	Use for
`30s` (minimum practical)	~30s + sync	Highest — watch throttling at scale	Tight rotation SLAs, few pods
`2m` (default)	~2m + sync	Moderate	Most workloads
`5m`	~5m + sync	Low	Large fleets, relaxed SLAs
`30m`+	~30m + sync	Minimal	Rarely-rotated secrets, cost-sensitive

Key Vault throttling is real — the data plane has request limits, and a large fleet polling aggressively can hit them:

Pressure source	Symptom	Confirm	Mitigation
Too-short poll × many pods	`429 Too Many Requests` in provider logs	Provider DaemonSet logs; KV metrics	Widen poll interval; fewer objects per SPC
Many distinct vaults polled	Aggregate request rate high	KV `ServiceApiHit` metric	Consolidate or stagger
Large `objects` arrays	Each poll fetches all objects	Count objects per SPC	Split SPCs; pin rarely-changing objects

Set realistic expectations on lag. Worst-case time from a Key Vault write to a file update is roughly one poll interval plus the kubelet’s atomic-write sync (usually under a minute). With env vars and Reloader, add the rollout time. Do not assume sub-second rotation; design for “new value is live within a poll interval, connections re-established on next use.”

TLS certificate consumption for ingress

A common goal is terminating TLS at an ingress controller with a cert that lives in Key Vault and rotates automatically. The pattern: mount objectType: secret against the certificate name (yielding the full PEM chain plus private key), sync it into a kubernetes.io/tls Secret, and point the Ingress at that Secret. This is the ingress-side complement to Kubernetes ingress controllers, TLS and routing.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: app-tls
  namespace: ingress
spec:
  provider: azure
  secretObjects:
    - secretName: app-tls
      type: kubernetes.io/tls
      data:
        - objectName: tls-app.pem      # the mounted filename (objectAlias below)
          key: tls.crt
        - objectName: tls-app.pem
          key: tls.key
  parameters:
    usePodIdentity: "false"
    clientID: "00001111-aaaa-2222-bbbb-3333cccc4444"
    keyvaultName: "kv-platform-prod"
    tenantId: "ffffffff-1111-2222-3333-444444444444"
    objects: |
      array:
        - |
          objectName: app-cert
          objectType: secret           # full chain + key
          objectAlias: tls-app.pem

A kubernetes.io/tls Secret requires both tls.crt and tls.key. The provider splits the PEM bundle when you map both keys to the same mounted object, because objectType: secret against a certificate returns the concatenated private key and cert chain. The cert-to-TLS-Secret decisions you must get right:

Decision	Right choice	Wrong choice does	Why
Object type for the cert	`secret`	`cert` returns no key → no `tls.key`	TLS needs the private key
`objectFormat`	`pem`	`pfx` won’t split into crt/key cleanly	Ingress expects PEM
Both keys → same mounted file	`tls.crt` + `tls.key` from `tls-app.pem`	mapping different files	Provider splits one bundle
Keep a pod mounting the SPC alive	a keeper/pause pod	letting all consumers die	GC removes the synced TLS Secret
Rotation on	`--enable-secret-rotation`	off → cert never refreshes	Auto-renewed certs must propagate

Keep a pod mounting the SPC alive so the synced TLS Secret is never garbage-collected out from under the ingress controller — this is why teams run a tiny pause-style keeper pod alongside the controller. The garbage-collection trap, made explicit:

Scenario	Synced TLS Secret state	Ingress result
≥1 pod mounts the SPC	Secret present, fresh	TLS serves normally
All mounting pods deleted/evicted	Secret garbage-collected	Ingress 503 / cert disappears
Keeper pod holds the mount	Secret persists independently of app	TLS stable across app rollouts
Cert rotates, keeper alive	Secret updated in place on next poll	Ingress picks up new cert

Architecture at a glance

Read the diagram left to right as the token-and-secret path the way the provider walks it at mount time. On the far left, a labelled application pod in the payments namespace runs under the app-sa service account. Because the pod carries azure.workload.identity/use: "true", the AKS workload-identity webhook projects a short-lived OIDC token into the pod and injects the AZURE_* environment variables. The pod mounts a CSI volume that points at the app-kv SecretProviderClass. That mount is the trigger for everything downstream — with no pod and no mount, nothing here fires.

In the middle zone live the managed add-on components in kube-system: the Secrets Store CSI driver DaemonSet and the Azure provider DaemonSet. When the volume mounts, the provider takes the pod’s projected token and performs the federated exchange against Entra — the federated identity credential on the user-assigned managed identity says “this OIDC issuer plus system:serviceaccount:payments:app-sa equals this identity,” so Entra returns an access token with no client secret ever involved. The provider then calls the Key Vault data plane over HTTPS 443, authorized by the identity’s RBAC role (Key Vault Secrets User for a secret, Certificate User for a key/cert). The returned objects are written to a tmpfs mount at /mnt/secrets-store, and — if secretObjects is set — mirrored into a native Kubernetes Secret for env-var consumption. With rotation enabled, the driver re-polls the vault every interval (default 2m) and rewrites the file and synced Secret on change. The numbered badges mark the exact hops where this path breaks, and the legend narrates each as symptom · confirm · fix.

The five badges map to the five things that actually go wrong in production: a federation subject mismatch (AADSTS70021) at the Entra exchange, a wrong RBAC role (403) at the vault, an objectName-vs-filename mismatch that yields an empty synced Secret, an env-var consumer that never sees a rotation, and a garbage-collected synced TLS Secret when the last consuming pod dies. Hold this picture and every later failure mode has a home.

Real-world scenario

A payments platform team ran a 40-node AKS cluster shared by eight product squads. They had standardized early on the add-on’s auto-created managed identity, granting it Key Vault Secrets User on a single shared vault. It worked — and it became a finding in their PCI assessment. Because the credential was node-scoped, every pod on every node could read every secret in the vault. The blast radius of one compromised pod was the entire secret store, and the audit log could not attribute a secret get to a workload, since all reads came from one identity.

The constraint: they could not split into per-team clusters (cost and operational load), nor take a maintenance window long enough to re-platform. They needed per-squad isolation on the existing cluster, with auditable, attributable Key Vault access, and zero downtime.

The fix was a migration to workload identity, one squad at a time. They enabled the OIDC issuer and workload-identity webhook in place (az aks update --enable-oidc-issuer --enable-workload-identity — a control-plane reconcile; pods kept running). Each squad got its own user-assigned identity, its own vault, and a federated credential bound to that squad’s service account:

# Per squad: scope the identity to exactly its service account + its vault
az identity create --name id-squad-payments --resource-group rg-platform
CID=$(az identity show -g rg-platform -n id-squad-payments --query clientId -o tsv)

az role assignment create --role "Key Vault Secrets User" \
  --assignee "$CID" \
  --scope "$(az keyvault show -n kv-squad-payments --query id -o tsv)"

az identity federated-credential create \
  --name fic-payments --identity-name id-squad-payments --resource-group rg-platform \
  --issuer "$AKS_OIDC_ISSUER" \
  --subject system:serviceaccount:payments:payments-sa \
  --audience api://AzureADTokenExchange

The migration was reversible per workload: they kept the add-on identity’s role assignment until each squad’s pods rolled over to the new SPC (usePodIdentity: "false", clientID set to the squad’s identity) and verified mounts, then revoked the shared assignment last. Because rotation was already on, no app-side change was needed on the data path — only the identity backing the mount changed.

What the migration cost and bought, squad by squad:

Phase	Action	Downtime	Risk	Rollback
0	Enable OIDC + WI webhook	None	Control-plane reconcile only	Disable flags
1	Per-squad UAMI + vault + FIC	None	New resources, no traffic yet	Delete the resources
2	Roll one squad’s pods to new SPC	Rolling, none	Mount could fail → caught in canary	Roll back the Deployment
3	Verify per-squad vault logs attribute reads	None	Observation only	n/a
4	Revoke shared add-on identity role last	None	Only after all squads migrated	Re-add the role assignment

The finding closed: Key Vault diagnostic logs now attributed every SecretGet to a named per-squad identity, and a compromised pod could reach exactly one squad’s secrets. The before/after in numbers:

Property	Before (shared add-on identity)	After (per-squad Workload ID)
Identities reading the vault	1 (node-scoped)	8 (one per squad)
Blast radius of one popped pod	All secrets, all squads	That squad’s secrets only
Audit attribution	Unattributable	Per-squad in KV logs
Client secrets stored	0 (already MI)	0 (federated)
Downtime to migrate	n/a	Zero

Advantages and disadvantages

The honest two-column trade-off:

Advantages	Disadvantages
Secrets never in git or (by default) etcd	Mount coupled to a running pod — no pod, no secret
Passwordless: federated, no client secret	Steeper setup than `kubectl create secret`
Per-SA identity → least privilege + audit	Subject-mismatch errors surface only at pod start
tmpfs mount: memory-backed, not on disk	Env-var consumers need Reloader to see rotation
Auto-rotation pulls new versions on a poll	Rotation lag = poll interval + sync (not instant)
First-party AKS add-on, platform-maintained	Synced Secret GC’d when last consumer dies
Works for secrets, keys, and TLS certs	RBAC role per object type is non-intuitive
Centralized rotation/versioning in Key Vault	Aggressive polling can throttle the vault data plane

When each side matters: choose the CSI driver whenever you need auditable, attributable, rotatable secret access on a shared cluster — the disadvantages are operational learning curves, not architectural dead-ends. Prefer plain Kubernetes Secrets only for throwaway dev clusters where audit and rotation are non-goals. For env-var-heavy apps where you cannot adopt Reloader, lean on file or mounted-Secret-volume consumption so rotation reaches the app without a restart. Where to use which model:

Situation	Use	Why
Multi-squad shared cluster, audit required	Workload ID + CSI	Per-SA attribution and least privilege
Single-tenant demo cluster	Add-on identity + CSI (or plain Secret)	Speed over isolation
App reads config from files	CSI mounted file	Rotation reaches it on next read
App reads config from env vars	CSI sync + Reloader	Restart on rotation
Ingress TLS from a rotating cert	CSI sync to `kubernetes.io/tls` + keeper pod	Cert propagates, Secret survives GC
Crypto operations (sign/verify)	Key Vault SDK directly, not CSI	Provider doesn’t do crypto ops

Hands-on lab

A copy-pasteable walk-through. It assumes an existing AKS cluster and a Key Vault you can grant RBAC on. Costs are negligible (a UAMI is free; a vault secret is fractions of a paisa per 10k operations).

# 0. Variables
export RESOURCE_GROUP=rg-lab
export CLUSTER_NAME=aks-lab
export KEYVAULT_NAME=kv-lab-$RANDOM
export UAMI=id-lab-secrets
export SA_NAME=demo-sa
export SA_NAMESPACE=demo

# 1. Enable the add-on + Workload ID prerequisites (idempotent)
az aks update -g $RESOURCE_GROUP -n $CLUSTER_NAME \
  --enable-oidc-issuer --enable-workload-identity
az aks enable-addons -g $RESOURCE_GROUP -n $CLUSTER_NAME \
  --addons azure-keyvault-secrets-provider
az aks get-credentials -g $RESOURCE_GROUP -n $CLUSTER_NAME --overwrite-existing

# 2. Create the vault (RBAC mode) and a test secret
az keyvault create -g $RESOURCE_GROUP -n $KEYVAULT_NAME --enable-rbac-authorization true
az role assignment create --role "Key Vault Secrets Officer" \
  --assignee "$(az ad signed-in-user show --query id -o tsv)" \
  --scope "$(az keyvault show -n $KEYVAULT_NAME --query id -o tsv)"
az keyvault secret set --vault-name $KEYVAULT_NAME --name db-connection-string \
  --value "Server=db;Pwd=initial"

# 3. Create the workload identity and grant it READ on the vault
az identity create -g $RESOURCE_GROUP -n $UAMI
export CID=$(az identity show -g $RESOURCE_GROUP -n $UAMI --query clientId -o tsv)
export TID=$(az aks show -g $RESOURCE_GROUP -n $CLUSTER_NAME --query identity.tenantId -o tsv)
az role assignment create --role "Key Vault Secrets User" --assignee "$CID" \
  --scope "$(az keyvault show -n $KEYVAULT_NAME --query id -o tsv)"

# 4. Service account + federated credential
export ISSUER=$(az aks show -g $RESOURCE_GROUP -n $CLUSTER_NAME \
  --query oidcIssuerProfile.issuerUrl -o tsv)
kubectl create namespace $SA_NAMESPACE
kubectl create serviceaccount $SA_NAME -n $SA_NAMESPACE
kubectl annotate serviceaccount $SA_NAME -n $SA_NAMESPACE \
  azure.workload.identity/client-id=$CID
az identity federated-credential create --name fic-lab --identity-name $UAMI \
  -g $RESOURCE_GROUP --issuer "$ISSUER" \
  --subject system:serviceaccount:$SA_NAMESPACE:$SA_NAME \
  --audience api://AzureADTokenExchange

# 5. SecretProviderClass (note clientID, tenantId, and the sync block)
cat <<EOF | kubectl apply -f -
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: lab-kv
  namespace: $SA_NAMESPACE
spec:
  provider: azure
  secretObjects:
    - secretName: lab-db
      type: Opaque
      data:
        - objectName: db-connection-string
          key: DB_CONNECTION_STRING
  parameters:
    usePodIdentity: "false"
    clientID: "$CID"
    keyvaultName: "$KEYVAULT_NAME"
    tenantId: "$TID"
    objects: |
      array:
        - |
          objectName: db-connection-string
          objectType: secret
          objectVersion: ""
EOF

# 6. A pod that mounts the volume AND reads the synced Secret as env
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: demo
  namespace: $SA_NAMESPACE
  labels:
    azure.workload.identity/use: "true"
spec:
  serviceAccountName: $SA_NAME
  containers:
    - name: app
      image: mcr.microsoft.com/azure-cli:latest
      command: ["sleep","3600"]
      env:
        - name: DB_CONNECTION_STRING
          valueFrom:
            secretKeyRef: { name: lab-db, key: DB_CONNECTION_STRING }
      volumeMounts:
        - name: kv
          mountPath: /mnt/secrets-store
          readOnly: true
  volumes:
    - name: kv
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: "lab-kv"
EOF

# 7. Verify the mount, the synced Secret, and env injection
kubectl wait --for=condition=Ready pod/demo -n $SA_NAMESPACE --timeout=120s
kubectl exec -n $SA_NAMESPACE demo -- ls /mnt/secrets-store/
kubectl exec -n $SA_NAMESPACE demo -- cat /mnt/secrets-store/db-connection-string; echo
kubectl get secret lab-db -n $SA_NAMESPACE -o jsonpath='{.data.DB_CONNECTION_STRING}' | base64 -d; echo
kubectl exec -n $SA_NAMESPACE demo -- printenv DB_CONNECTION_STRING

# 8. Prove rotation reaches the FILE but NOT the env var
az aks addon update -g $RESOURCE_GROUP -n $CLUSTER_NAME \
  --addon azure-keyvault-secrets-provider --enable-secret-rotation
az keyvault secret set --vault-name $KEYVAULT_NAME --name db-connection-string \
  --value "Server=db;Pwd=ROTATED-$(date +%s)"
sleep 150   # ~ one default poll interval + sync slack
kubectl exec -n $SA_NAMESPACE demo -- cat /mnt/secrets-store/db-connection-string; echo  # NEW value
kubectl exec -n $SA_NAMESPACE demo -- printenv DB_CONNECTION_STRING                       # OLD value (snapshot)

# 9. Teardown
kubectl delete namespace $SA_NAMESPACE
az identity federated-credential delete --name fic-lab --identity-name $UAMI -g $RESOURCE_GROUP --yes
az identity delete -g $RESOURCE_GROUP -n $UAMI
az keyvault delete -n $KEYVAULT_NAME -g $RESOURCE_GROUP
az keyvault purge -n $KEYVAULT_NAME   # if purge protection / soft-delete retains it

Expected outputs at each verify step:

Step	Command	Expected result
7	`ls /mnt/secrets-store/`	`db-connection-string`
7	`cat .../db-connection-string`	`Server=db;Pwd=initial`
7	`get secret lab-db ... base64 -d`	`Server=db;Pwd=initial`
7	`printenv DB_CONNECTION_STRING`	`Server=db;Pwd=initial`
8	`cat .../db-connection-string` (post-rotate)	`Server=db;Pwd=ROTATED-...` (NEW)
8	`printenv DB_CONNECTION_STRING` (post-rotate)	`Server=db;Pwd=initial` (OLD — the lesson)

Step 8 is the whole point of the lab: the file updates, the env var does not. To make env update too, annotate the Deployment for Reloader and let it restart the pod.

Common mistakes & troubleshooting

The differentiator. Every failure mode here is one we have actually hit. Read the playbook table top to bottom mid-incident; the prose under it explains the gnarly ones.

#	Symptom	Root cause	Confirm (exact command/path)	Fix
1	Pod stuck `ContainerCreating`	Volume mount failing for some reason	`kubectl describe pod <p> -n <ns>` → Events	Read the event string; jump to the matching row below
2	Event: `No matching federated identity record found` / `AADSTS70021`	FIC `--subject` or `issuer` mismatch	`az identity federated-credential list --identity-name <uami> -g <rg>`	Recreate FIC with exact `system:serviceaccount:<ns>:<sa>` and the cluster issuer
3	Event: `does not have secrets get permission` (`403`)	Wrong/missing Key Vault RBAC	`az role assignment list --assignee <cid> --scope <kv-id>`	Assign correct role (Secrets vs Certificate User) at vault scope
4	Event: `Secret not found`	`objectName` not in the vault (typo/wrong vault)	`az keyvault secret show --vault-name <kv> --name <obj>`	Fix the name or create the object
5	Synced Secret exists but `data: {}` (empty)	`secretObjects.data.objectName` ≠ mounted filename	Compare SPC value to `kubectl exec ... ls /mnt/secrets-store`	Set it to the alias (or `objectName` if no alias)
6	Synced Secret never appears	No pod has mounted the SPC yet	`kubectl get secretproviderclasspodstatus -n <ns>`	Deploy a pod that mounts the volume
7	Synced Secret vanished; ingress 503	GC’d — last consuming pod deleted	`kubectl get pods -n <ns>` referencing the SPC	Run a keeper pod that holds the mount
8	Rotated value never reaches app	App reads env var (start-time snapshot)	`printenv` shows old; file shows new	Use file/Secret-volume consumption, or Reloader to restart
9	Rotation never updates a mounted file	Rotation disabled, or `subPath` mount	`az aks show ... config`; check `subPath` in the volume	Enable rotation; remove `subPath`
10	Auth fails despite correct FIC	Pod missing `azure.workload.identity/use: "true"`	`kubectl get pod <p> -o yaml \| grep workload.identity`	Add the label; recreate the pod
11	`clientID ... not found`	SA annotation client-id wrong	`kubectl get sa <sa> -n <ns> -o yaml` vs `az identity show`	Match annotation to the UAMI `clientId`
12	`429 Too Many Requests` from Key Vault	Poll interval too short for fleet size	Provider DaemonSet logs; KV `ServiceApiHit`	Widen `--rotation-poll-interval`; split SPCs
13	TLS Secret empty / ingress handshake fails	`objectType: cert` (no key) instead of `secret`	Inspect SPC `objects` and synced Secret keys	Use `objectType: secret` for full chain + key
14	SPC apply errors / object silently skipped	Malformed nested `objects` YAML (`- \|`)	`kubectl get spc <name> -o yaml`; re-validate indentation	Fix the block scalar; one `- \|` per item
15	Cross-tenant vault `AADSTS` error	`tenantId` is the cluster tenant, not the vault’s	`az keyvault show ... --query properties.tenantId`	Set `tenantId` to the vault’s resource tenant

The three that burn the most hours

Subject mismatch (#2). The FIC error appears at pod start, never at apply — because the binding is only exercised when a token is actually exchanged. The subject is case-sensitive and has a fixed shape: system:serviceaccount:<namespace>:<serviceaccount>. Re-read the subject-mismatch table shapes — a _ for a -, a wrong namespace, or a missing system: prefix all produce the identical AADSTS70021. Confirm with az identity federated-credential list and compare byte-for-byte to your pod’s serviceAccountName and namespace.

Empty synced Secret (#5). You apply the SPC, the pod runs, the file is on the mount — but kubectl get secret app-db -o jsonpath='{.data}' returns {}. The cause is almost always that secretObjects.data.objectName was set to the Key Vault object name when it must be the mounted filename. If you set objectAlias: tls-app.pem, the file is tls-app.pem and the sync must reference tls-app.pem, not tls-app. Trace the mapping chain: vault name → objectName → objectAlias → mounted filename → secretObjects.data.objectName.

Env-var staleness (#8). Everything looks healthy — the vault has the new value, the file has the new value, the synced Secret has the new value — yet the app behaves on the old credential. Environment variables are injected once at container start and never change. Confirm by comparing cat /mnt/secrets-store/<file> (new) to printenv <VAR> (old). The fix is structural: read from the file or a mounted Secret volume, or annotate the workload for Reloader so a rotation triggers a rolling restart.

Best practices

Prefer Workload ID over the add-on identity on any shared cluster. Per-SA identities give you least privilege and per-workload audit attribution; the node-scoped add-on identity gives you neither.
Grant the minimum RBAC role for the object type — Secrets User for secrets/PEM chains, Certificate User for keys/certs. Never reach for Administrator on a mount identity.
Scope role assignments to the vault (or the individual secret), not the subscription. Tighten to per-secret scope where the platform supports it.
Leave objectVersion: "" so rotation pulls the latest; pin a version only for a deliberate compliance freeze.
Consume secrets as files or mounted-Secret volumes when you can, so rotation reaches the app without a restart. Reserve env vars for values that genuinely never rotate, or front them with Reloader.
Never mount via subPath if you want updates — kubelet will not refresh a subPath-mounted Secret/ConfigMap.
Run a keeper pod for any synced Secret an ingress controller or other long-lived consumer depends on, so the driver’s GC never pulls it out from under them.
Set the poll interval deliberately — match it to your rotation SLA and fleet size, and watch Key Vault throttling at the tight end.
Keep objects arrays small per SPC — split by concern so a single poll does not fetch a huge object set, and so one bad object does not block a whole mount.
Treat a failed mount as a pod-start failure in your readiness design — alert on ContainerCreating that does not clear within a minute.
Enable Key Vault diagnostic logs and verify per-identity attribution after any identity migration.
Version-control the SPC and IaC together with the identity and role assignment, so the whole chain deploys and reviews as one unit (see Terraform module for Azure Key Vault).

Security notes

The CSI driver’s security value is that secret material never touches git, never (by default) touches etcd, and lands only on a memory-backed tmpfs mount — but you still have to wire identity and network correctly. The controls that matter, with the setting that enforces each:

Control	Why it matters	How to enforce	Verify
Per-SA Workload ID	Least privilege, audit attribution	UAMI + FIC per service account	Vault logs show per-SA `SecretGet`
Minimal RBAC role	Limits what a popped pod can read	`Secrets/Certificate User` at vault scope	`az role assignment list --scope <kv>`
No K8s sync unless needed	Keeps secrets out of etcd	Omit `secretObjects` for file-only apps	`kubectl get secret` shows none synced
tmpfs mount	Secrets never written to node disk	Default behaviour of the driver	Mount is memory-backed
Vault firewall / Private Endpoint	Vault not reachable from the internet	KV network rules + PE to the cluster subnet	`az keyvault show ... networkAcls`
Soft-delete + purge protection	Recover a deleted/rotated secret	Vault properties	`az keyvault show ... enableSoftDelete`
Pod label gating	Only intended pods get a token	`azure.workload.identity/use: "true"`	Absent label → no token projected
Read-only mount	Pod cannot tamper with secrets	`readOnly: true` on the volume	Write attempts fail

Network isolation deserves emphasis: put the vault behind a Private Endpoint into the cluster’s subnet and set networkAcls to deny public traffic, so even a leaked identity cannot reach the vault from outside your network. Combine with Entra managed identities and federated credentials hardening — short-lived federated tokens with no stored secret are the baseline, and Conditional Access on the workload identity raises the bar further. The least-privilege ladder, from worst to best:

Posture	Identity	Scope	Verdict
Worst	Add-on identity, shared	Whole vault	Unattributable, max blast radius
Better	Per-SA UAMI	Whole vault	Attributable, still broad
Good	Per-SA UAMI	Per-secret scope	Least privilege per workload
Best	Per-SA UAMI + PE-only vault + CA	Per-secret scope	Network + identity + policy defence-in-depth

Cost & sizing

The CSI driver itself is free — it is a first-party add-on with no licence cost. What you pay for is Key Vault operations and a trivial amount of node resource for the DaemonSets. The cost drivers and rough figures (INR at ~₹84/USD, indicative):

Cost driver	Unit	Rough cost	Notes
Secrets Store CSI add-on	per cluster	₹0 (free)	First-party AKS add-on
Key Vault operations	per 10,000 transactions	~₹2.5 (~$0.03)	Each poll fetches each object = transactions
Key Vault (Standard tier)	per vault	no base fee; pay per op	Premium adds HSM-backed keys
Key Vault (Premium HSM)	per HSM key/month	~₹85 (~$1) + ops	Only if you need HSM-backed keys
UAMI	per identity	₹0 (free)	No charge for managed identities
DaemonSet CPU/memory	per node	negligible	Tiny footprint per node
NAT/Private Endpoint (optional)	per endpoint/hour	~₹0.8/hr (~$0.01)	If you isolate the vault network

The variable that actually moves the bill is poll frequency × object count × pod count, because each poll fetches each object as a billable transaction. A worked example:

Scenario	Pods	Objects/SPC	Poll interval	Transactions/day	Rough cost/day
Small app	3	2	2m	~12,960	~₹3
Medium fleet	50	3	2m	~324,000	~₹81
Large fleet, tight poll	200	4	30s	~6.9M	~₹1,725
Large fleet, relaxed poll	200	4	5m	~691,200	~₹173

The right-sizing levers, in order of impact: widen the poll interval to your real rotation SLA (a 5m interval is 2.5× cheaper than 2m and usually fine), pin rarely-changing objects to a version so they are not re-fetched every poll, and split SPCs so unrelated objects do not all poll on the same cadence. The Standard tier has no base fee; reach for Premium only when you genuinely need HSM-backed keys. There is no free-tier limit to worry about here — the per-transaction cost is small, and the failure mode at the high end is throttling (429), not a surprise invoice.

Interview & exam questions

Q1. Why does the add-on always create a managed identity even when you intend to use Workload ID? The add-on provisions a user-assigned managed identity in the node resource group and assigns it to the VMSS as part of installation; you cannot suppress it. You simply do not use it — your SPC points at a different workload identity via clientID and usePodIdentity: "false". Maps to AZ-500/CKS identity topics.

Q2. A pod is stuck in ContainerCreating with AADSTS70021. What is wrong and how do you confirm? The federated credential’s subject (or issuer) does not match the pod’s system:serviceaccount:<ns>:<sa> and the cluster OIDC issuer. Confirm with az identity federated-credential list and compare to the pod’s namespace and serviceAccountName. The error appears at pod start, not at apply.

Q3. You mount an objectType: secret against a Key Vault certificate. What do you get, and what RBAC role is required? You get the full PEM chain plus the private key (the cert’s secret backing), which requires Key Vault Secrets User. By contrast, key/cert object types go through the certificate path and need Key Vault Certificate User.

Q4. The synced Kubernetes Secret exists but is empty. Most likely cause? secretObjects.data.objectName was set to the Key Vault object name instead of the mounted filename. If an objectAlias is set, the filename is the alias and the sync must reference it. Fix the mapping.

Q5. You rotate a secret in Key Vault. The mounted file updates but the app keeps using the old value. Why? The app reads the secret as an environment variable, which is a snapshot taken at container start and never changes. Switch to file or mounted-Secret-volume consumption, or use Reloader to restart the pod on Secret change.

Q6. Why does a subPath-mounted Secret not pick up rotation? It is a kubelet limitation: subPath volume mounts do not receive in-place updates. Mount the whole volume instead of a subPath.

Q7. What happens to the synced Secret when the last pod consuming it is deleted, and why does it matter for ingress? The driver garbage-collects it. An ingress controller depending on a synced kubernetes.io/tls Secret will lose its cert and start failing. Run a keeper pod that holds the mount so the Secret persists.

Q8. Where do secrets land on the node, and what is the significance for etcd? On a tmpfs (memory-backed) mount, not on disk. Unless you opt into the K8s sync via secretObjects, nothing is written to etcd — which is the security win over plain Kubernetes Secrets.

Q9. How do you make an env-var-consuming app pick up rotation automatically? You cannot change a running container’s env. Annotate the workload with reloader.stakater.com/auto: "true" (running Reloader) so a change to the synced Secret triggers a rolling restart that re-injects the new value.

Q10. Your fleet started getting 429 from Key Vault after enabling rotation. What changed and how do you fix it? Each poll fetches each object as a transaction; a short poll interval across many pods/objects exceeds the data-plane request limit. Widen --rotation-poll-interval, split SPCs, and pin rarely-changing objects to reduce per-poll fetches.

Q11. Which two SPC parameters signal Workload ID specifically, and what value must each take? usePodIdentity: "false" and clientID set to the workload identity’s clientId (a string). That combination tells the provider to use the projected service-account token.

Q12. Why is the CSI mount fundamentally pull-based, and what is the operational consequence? The provider only fetches at mount time (and on rotation polls). The consequence: a secret’s availability is coupled to a running pod and a working identity — no pod means no Secret, and a bad identity means the pod never starts.

Quick check

You want rotation to reach your app without a restart. Which consumption pattern do you choose, and which one do you avoid?
Your federated credential --subject reads system:serviceaccount:payments. What error will you see and when?
You mount objectType: cert and your ingress TLS handshake fails. What did you do wrong?
The synced Secret is empty though the file is present on the mount. What single field is misconfigured?
An ingress controller’s TLS Secret disappeared after a deploy. What lifecycle behaviour caused it and what is the fix?

Answers

Read from the mounted file or a mounted-Secret volume (both update in place on the next poll); avoid env vars, which are a start-time snapshot and never change in a running container.
AADSTS70021: No matching federated identity record found, at pod start (not at az/apply time) — the subject is missing the SA-name segment; it must be system:serviceaccount:payments:<sa>.
You used objectType: cert, which returns the certificate only with no private key, so there is no tls.key. Use objectType: secret against the cert to get the full PEM chain plus key.
secretObjects.data.objectName — it must equal the mounted filename (the objectAlias if set, otherwise objectName), not the Key Vault object name.
The synced Secret is garbage-collected when the last consuming pod is deleted. During the deploy all consuming pods rolled out and the Secret vanished. Run a keeper pod that holds the SPC mount so the Secret persists across rollouts.

Glossary

Secrets Store CSI Driver — A Kubernetes CSI driver that mounts external secret stores (here, Key Vault) as files on a volume.
Azure provider — The plugin (secrets-store-provider-azure) that authenticates to and fetches from Key Vault on the driver’s behalf.
SecretProviderClass (SPC) — A namespaced CRD describing which vault, identity, and objects to fetch, plus an optional sync block.
Workload Identity (Entra Workload ID) — Federated identity binding a Kubernetes service account to a managed identity with no stored client secret.
User-assigned managed identity (UAMI) — A standalone Azure identity you create, grant RBAC, and federate to a service account.
Federated identity credential (FIC) — The trust object on a UAMI specifying the OIDC issuer, subject, and audience that may assume it.
OIDC issuer — The cluster’s token-issuing endpoint URL; the issuer value in the FIC.
Workload-identity webhook — The mutating admission webhook that projects the OIDC token and AZURE_* env into labelled pods.
secretObjects — The SPC block that mirrors mounted objects into a native Kubernetes Secret.
objectAlias — The SPC field that renames the mounted file; the sync and any file consumer key off this name.
Poll interval — How often the driver checks the vault for changes when rotation is enabled (default 2 minutes).
tmpfs — A memory-backed filesystem; where mounted secret material lives, so it never hits node disk or etcd.
Reloader — A controller that watches Secrets/ConfigMaps and triggers a rolling restart on change, used to refresh env-var consumers.
subPath — A volume-mount option that pins a single file/sub-directory; notably does not receive in-place updates.
Garbage collection (driver) — The driver’s removal of a synced Secret once the last consuming pod is deleted.

Next steps

Azure Key Vault with Workload Identity for secretless secrets — the auth pattern this article builds on, in depth.
Azure Key Vault secret rotation with managed identity — the vault-side rotation events the driver consumes.
Kubernetes ingress controllers, TLS and routing deep dive — where the Key Vault TLS-cert pattern lands in production.
Entra managed identities deep dive: user-assigned, FIC, RBAC — hardening the identity that backs every mount.
Kubernetes troubleshooting methodology: pods, nodes, networking, storage, RBAC — the fallback when a mount fails in ways the add-on cannot explain.