Containerization Lesson 93 of 113

Secrets Store CSI Driver on AKS: Mounting Key Vault Secrets with Rotation and K8s Sync

The Secrets Store CSI Driver lets a pod mount Azure Key Vault secrets, keys, and certificates as files on a tmpfs volume, with no secret material written to etcd by default. On AKS it ships as a first-party add-on (azure-keyvault-secrets-provider): a managed DaemonSet, an Azure provider plugin, and platform-owned lifecycle rather than a Helm chart you babysit. You annotate a service account, write a SecretProviderClass, mount a CSI volume, and Key Vault material appears at /mnt/secrets-store inside the container — passwordless, auditable, and rotatable.

The interesting engineering is not the mount; it is the identity backing the mount and the rotation semantics once secrets change underneath running pods. Three facts trip up almost everyone the first time. The driver only fetches secrets when a pod mounts a volume — no pod, no Secret. The synced Kubernetes Secret is created on first mount and garbage-collected when the last consuming pod dies. And rotation reaches a file or a mounted Secret volume on the next poll, but an environment variable injected at container start never changes, so an app reading config from env keeps the old value forever unless you force a restart. Miss any of these and you ship an incident.

This walkthrough wires the add-on to Microsoft Entra Workload ID (federated, no client secrets), authors a SecretProviderClass, syncs the mounted objects into a native Kubernetes Secret for env-var consumption, terminates TLS at ingress from a Key Vault certificate, and turns on auto-rotation — with a clear-eyed view of what actually propagates and what does not. By the end you will read the symptom (ContainerCreating forever, AADSTS70021, an empty synced Secret, a rotated value that never reaches the app) and name the exact cause and fix in under a minute. Throughout, I prefer workload identity over the add-on’s auto-created managed identity: the add-on identity is a node-scoped, cluster-wide credential, while workload identity scopes access to a single service account — what you want for least privilege and clean auditing. The mechanics here build directly on Azure Key Vault with Workload Identity for secretless secrets and the broader Kubernetes ConfigMaps and Secrets deep dive.

What problem this solves

The naïve way to get a database password into a pod is a Kubernetes Secret you kubectl create from a literal, or worse, a value baked into a Helm values.yaml checked into git. Both put plaintext (base64 is not encryption) in etcd and in your version control, readable by anyone with get secret RBAC or repo access. There is no rotation story, no audit trail of which workload read which secret when, and no separation between the team that owns the secret and the team that runs the cluster. When an auditor asks “prove that pod A cannot read pod B’s database credential,” you have no answer.

The Secrets Store CSI Driver moves the source of truth to Key Vault and the identity to Entra. The secret lives in a vault with its own RBAC, versioning, soft-delete, purge protection, and diagnostic logs that record every SecretGet with the calling identity. The pod authenticates with a federated token — no client secret is ever issued or stored. Mounted material lands on tmpfs (memory-backed), never on disk, never in etcd unless you explicitly opt into the K8s sync. Rotation happens in the vault and the driver pulls the new version on a poll. The blast radius of a compromised pod shrinks from “the entire secret store” to “exactly the secrets that one service account was granted.”

Who hits the pain this solves: any team running multi-tenant AKS where squads share a cluster but must not share secrets; anyone failing a PCI/SOC2/ISO audit on secret attribution; teams whose “rotation” today means a manual kubectl edit secret and a prayer; and anyone who has shipped a credential to a public registry or git history. The cost of getting it wrong is concrete — a single shared identity means one popped pod reads everything, and your audit log attributes nothing.

To frame the field before the deep dive, here is every failure class this article covers, where it bites on the path, and the one place to look first:

Symptom class What you actually see First question to ask First place to look Most common single cause
Pod stuck ContainerCreating Pod never reaches Running Did the volume mount fail? kubectl describe pod events Federation subject mismatch or missing RBAC
AADSTS70021 Event: “No matching federated identity record found” Does the FIC subject match the SA? az identity federated-credential list --subject typo vs system:serviceaccount:<ns>:<sa>
403 Forbidden from Key Vault Event: “does not have secrets get permission” Is the RBAC role correct for the object type? az role assignment list --scope <kv> Wrong role (Secrets vs Certificate User)
Empty synced Secret kubectl get secret exists but data: {} Does objectName match the filename? Compare SPC objectName to mount ls Wired to KV name, not the mounted alias
Rotated value never reaches app New vault value, old app behaviour How does the app read the secret? Consumption pattern (file vs env) Env var snapshot at container start
Synced Secret vanished Ingress 503, TLS gone Did the last consumer pod die? kubectl get pods referencing the SPC GC when last mounting pod deleted

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should be comfortable with AKS fundamentals — a cluster, kubectl, namespaces, and the difference between a Deployment and a Pod. You should understand what a Kubernetes Secret is and that base64 is encoding, not encryption (the ConfigMaps and Secrets deep dive covers this). You should know what a managed identity is at a basic level — see Entra managed identities deep dive — and have a Key Vault you can grant RBAC on. Familiarity with Kubernetes RBAC and service accounts makes the federation step click.

This sits in the Security / platform layer of an AKS deep-dive track. Upstream of it: Azure Key Vault secrets, keys, and certificates (what is in the vault) and Key Vault with Workload Identity (the auth pattern). Adjacent: Azure Key Vault secret rotation with managed identity (the vault-side rotation event this article consumes) and Kubernetes ingress controllers, TLS and routing (where the TLS cert pattern lands). When mounts fail in ways the add-on cannot explain, the general Kubernetes troubleshooting methodology is your fallback.

A quick map of who owns what during an incident, so you call the right person fast:

Layer What lives here Who usually owns it Failure classes it can cause
Key Vault (data plane) Secrets, keys, certs, RBAC, firewall Security / platform 403, object-not-found, firewall block
Entra (identity) UAMI, federated credential, tenant Identity team AADSTS70021, AADSTS700016
AKS control plane OIDC issuer, workload-identity webhook Platform / SRE Token not projected, webhook off
CSI driver + provider DaemonSet in kube-system, the mount Platform (add-on) Mount hang, provider crash, rotation lag
SecretProviderClass (CRD) Namespaced config, objects, sync map App team Empty Secret, wrong alias, parse error
Pod / Deployment SA name, label, volume, env wiring App / dev team Missing label, wrong SA, env staleness

Core concepts

Five mental models make every later diagnosis obvious.

The mount is pull-based and identity-gated. The driver does nothing until a pod mounts a CSI volume that references a SecretProviderClass. At mount time the Azure provider exchanges the pod’s projected service-account token for an Entra token (via the federated credential), calls the Key Vault data plane, and writes the returned objects to a tmpfs volume. No pod means no fetch; a bad identity means the mount fails and the pod never starts. This is the single most important behavioural fact: the secret’s availability is coupled to a running pod and a working identity.

Workload Identity is federation, not a stored secret. A user-assigned managed identity (UAMI) is bound to a Kubernetes service account by a federated identity credential (FIC) whose subject is system:serviceaccount:<namespace>:<name> and whose issuer is the cluster’s OIDC issuer URL. When a labelled pod runs, the webhook projects a short-lived OIDC token; Entra trusts it because the FIC says “this issuer + this subject = this identity.” No client secret exists anywhere. The audience is the fixed string api://AzureADTokenExchange.

The Key Vault object model is not the intuitive crypto/secret split. A Key Vault certificate is internally backed by both a key and a secret. The provider retrieves key and cert object types through the certificate path, so both require Key Vault Certificate User — not “Key”, which is for crypto operations the provider never performs. Mounting a full PEM chain (cert + private key) uses objectType: secret against the certificate, which needs Key Vault Secrets User. Getting this wrong yields a 403 at mount, not at apply.

The synced Kubernetes Secret has a driver-owned lifecycle. The secretObjects block mirrors mounted files into a real Secret, but that Secret is created on first mount (not on SPC apply) and garbage-collected when the last consuming pod is deleted. Anything that reads the Secret before the first pod (a Helm pre-install hook, an unrelated Deployment) gets not-found; anything that depends on it persisting after the pods are gone (an ingress controller) breaks when it vanishes. The driver owns it; do not treat it as an independent object.

Rotation lag is real and consumption-dependent. With rotation on, the driver polls every interval (default 2 minutes) and, on a detected change, updates both the mounted file and the synced Secret. A file consumer sees the new value on next read; a Secret-as-volume consumer sees it in place; a Secret-as-env-var consumer sees nothing because env is a point-in-time snapshot at container start. Worst-case lag is roughly one poll interval plus the kubelet’s atomic-write sync — design for “live within a poll interval,” never sub-second.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary repeats these for lookup; this is the mental model side by side:

Concept One-line definition Where it lives Why it matters here
Add-on azure-keyvault-secrets-provider AKS managed Installs driver + Azure provider
CSI driver Generic Secrets Store CSI DaemonSet kube-system Mounts the volume, runs rotation poll
Azure provider Plugin that talks to Key Vault kube-system Auth + fetch from the vault
SecretProviderClass Namespaced CRD describing what to fetch Per namespace The config you author
UAMI User-assigned managed identity Resource group The identity that reads the vault
Federated credential (FIC) Trust binding SA ↔ UAMI On the UAMI No client secret; subject must match
OIDC issuer Cluster token issuer URL AKS control plane The issuer in the FIC
Workload-identity webhook Injects token + AZURE_* env AKS (add-on) Triggered by the pod label
secretObjects Block that syncs mount → K8s Secret In the SPC Optional; needed for env vars
objectAlias Renames the mounted file In the SPC objects Sync keys off this filename
Poll interval How often rotation checks the vault Add-on config Default 2m; trade freshness vs API load
tmpfs In-memory mount backing Node Why secrets never hit disk/etcd

Enable the add-on and choose your identity model

The add-on installs the Secrets Store CSI Driver plus the Azure provider. On an existing cluster:

export RESOURCE_GROUP=rg-platform
export CLUSTER_NAME=aks-platform

az aks enable-addons \
  --addons azure-keyvault-secrets-provider \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP

Enabling the add-on always creates a user-assigned managed identity named azurekeyvaultsecretsprovider-<cluster> in the node resource group (MC_...) and assigns it to the node VMSS. You cannot prevent its creation, but you do not have to use it. There are exactly two ways to authenticate to Key Vault, and the choice is a security decision, not a convenience one:

Model Credential scope Audit granularity Best for Blast radius if pod compromised
Add-on managed identity Node-level, shared by every pod One identity for all reads — unattributable Quick demos, single-tenant clusters Every secret the identity can read
Workload ID (recommended) Per Kubernetes service account Per-SA attribution in vault logs Multi-team clusters, least privilege, PCI/SOC2 Only that SA’s granted secrets

Workload ID requires the OIDC issuer and the workload-identity webhook. If you create the cluster fresh, include both flags:

az aks create \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --enable-addons azure-keyvault-secrets-provider \
  --enable-oidc-issuer \
  --enable-workload-identity \
  --generate-ssh-keys

On an existing cluster, enable them in place — idempotent, a control-plane reconcile, pods keep running:

az aks update \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --enable-oidc-issuer \
  --enable-workload-identity

The same in Bicep, for the cluster resource:

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: clusterName
  location: location
  identity: { type: 'SystemAssigned' }
  properties: {
    oidcIssuerProfile: { enabled: true }
    securityProfile: {
      workloadIdentity: { enabled: true }
    }
    addonProfiles: {
      azureKeyvaultSecretsProvider: {
        enabled: true
        config: {
          enableSecretRotation: 'true'
          rotationPollInterval: '2m'
        }
      }
    }
  }
}

Every flag in this group, what it does, and the cost of getting it wrong:

Flag / setting What it enables Default Required for Workload ID? If omitted
--enable-addons azure-keyvault-secrets-provider Driver + Azure provider DaemonSets off Yes No mount capability at all
--enable-oidc-issuer Cluster OIDC issuer URL off Yes No issuer to put in the FIC
--enable-workload-identity Mutating webhook for token projection off Yes Token never projected; auth fails
enableSecretRotation (config) Rotation poll loop false No Secrets fetched once, never refreshed
rotationPollInterval (config) Poll cadence 2m No n/a (only with rotation on)

Confirm the add-on landed and grab the auto-created identity details (useful for inventory even if you do not use it):

az aks show \
  --name $CLUSTER_NAME \
  --resource-group $RESOURCE_GROUP \
  --query addonProfiles.azureKeyvaultSecretsProvider
{
  "config": { "enableSecretRotation": "false", "rotationPollInterval": "2m" },
  "enabled": true,
  "identity": {
    "clientId": "00001111-aaaa-2222-bbbb-3333cccc4444",
    "objectId": "aaaaaaaa-0000-1111-2222-bbbbbbbbbbbb",
    "resourceId": ".../userAssignedIdentities/azurekeyvaultsecretsprovider-aksplatform"
  }
}

The driver and provider run as DaemonSets in kube-system:

kubectl get pods -n kube-system \
  -l 'app in (secrets-store-csi-driver,secrets-store-provider-azure)' -o wide

You should see aks-secrets-store-csi-driver-* (3/3 containers) and aks-secrets-store-provider-azure-* (1/1), one of each per node. What each managed component is and how to sanity-check it:

Component Kind Namespace Healthy signal If unhealthy
aks-secrets-store-csi-driver DaemonSet kube-system 3/3 per node, Running Mounts hang cluster-wide
aks-secrets-store-provider-azure DaemonSet kube-system 1/1 per node, Running Azure auth/fetch fails
secretproviderclasses.secrets-store.csi.x-k8s.io CRD cluster kubectl get crd lists it SPC apply errors
secretproviderclasspodstatuses CRD cluster per-pod fetch status object No status → fetch never ran
Add-on UAMI Identity MC_* RG exists, VMSS-assigned Add-on identity auth fails

Federate a managed identity to a service account

The heart of the passwordless model: create a user-assigned identity, grant it data-plane RBAC on the vault, then bind it to a service account via a federated credential. No client secret is ever issued.

export UAMI=id-app-secrets
export KEYVAULT_NAME=kv-platform-prod
export SA_NAME=app-sa
export SA_NAMESPACE=payments

# 1. Create the workload identity
az identity create --name $UAMI --resource-group $RESOURCE_GROUP

export USER_ASSIGNED_CLIENT_ID=$(az identity show \
  --resource-group $RESOURCE_GROUP --name $UAMI --query clientId -o tsv)
export IDENTITY_TENANT=$(az aks show \
  --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP \
  --query identity.tenantId -o tsv)

The RBAC role depends on the object type, not the crypto/secret intuition

With an RBAC-enabled vault, the role you assign depends on what you mount — and this trips people up constantly because it does not follow the obvious split:

Object type in SecretProviderClass Required built-in role What you get back Why this role
secret Key Vault Secrets User The secret value (or full PEM chain if against a cert) Secret data-plane read
key Key Vault Certificate User The public key (PEM) Retrieved via the cert path
cert Key Vault Certificate User The certificate only (PEM, no chain) Retrieved via the cert path

Per the AKS docs, both key and cert object types require Key Vault Certificate User — the provider retrieves them through the certificate path, not a crypto operation. Because a Key Vault certificate is internally backed by both a key and a secret, mounting a full cert chain via objectType: secret against a certificate needs Key Vault Secrets User. Assign only what you mount:

export KEYVAULT_SCOPE=$(az keyvault show --name $KEYVAULT_NAME --query id -o tsv)

az role assignment create \
  --role "Key Vault Secrets User" \
  --assignee $USER_ASSIGNED_CLIENT_ID \
  --scope $KEYVAULT_SCOPE

A reference of the relevant Key Vault data-plane roles, so you grant the minimum and never reach for an admin role:

Role Data-plane permissions Use for Do NOT use when
Key Vault Secrets User get/list secrets objectType: secret, full PEM chains You only need cert public parts
Key Vault Certificate User get/list certificates objectType: key, objectType: cert You mount the private key chain
Key Vault Crypto User wrap/unwrap, sign/verify Crypto ops (not this provider) Mounting via CSI — never needed
Key Vault Secrets Officer full secret CRUD Pipelines that write secrets A read-only mount identity
Key Vault Administrator full data-plane admin Break-glass / setup Any workload identity (over-privileged)

RBAC vs the legacy access-policy model — know which your vault uses, because the wrong one silently grants nothing:

Authorization mode How access is granted How to grant the mount identity Detect with
Azure RBAC (recommended) az role assignment create at vault/secret scope The role table above enableRbacAuthorization: true on the vault
Vault access policy (legacy) az keyvault set-policy --secret-permissions get Per-identity policy, get/list enableRbacAuthorization: false
Mixed (migration) Both can be present Match whichever the vault enforces Vault property + a test mount

Bind the federated credential

Get the cluster’s OIDC issuer URL, create the service account annotated with the identity’s client ID, then bind the federated credential to the system:serviceaccount:<ns>:<name> subject:

export AKS_OIDC_ISSUER=$(az aks show \
  --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME \
  --query oidcIssuerProfile.issuerUrl -o tsv)

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ${SA_NAME}
  namespace: ${SA_NAMESPACE}
  annotations:
    azure.workload.identity/client-id: ${USER_ASSIGNED_CLIENT_ID}
EOF

az identity federated-credential create \
  --name fic-app-secrets \
  --identity-name $UAMI \
  --resource-group $RESOURCE_GROUP \
  --issuer ${AKS_OIDC_ISSUER} \
  --subject system:serviceaccount:${SA_NAMESPACE}:${SA_NAME} \
  --audience api://AzureADTokenExchange

The --subject must match the namespace and service account name exactly. A typo here produces the single most common failure mode — AADSTS70021: No matching federated identity record found — at pod startup, not at apply time. Every field in the FIC, what it must equal, and the error if it is wrong:

FIC field Must equal Source of truth Error if wrong
issuer Cluster OIDC issuer URL az aks show ... oidcIssuerProfile.issuerUrl AADSTS70021 (no match)
subject system:serviceaccount:<ns>:<sa> The pod’s namespace + SA name AADSTS70021 (no match)
audience api://AzureADTokenExchange Fixed constant AADSTS700016 / invalid audience
(SA annotation) client-id UAMI clientId az identity show ... clientId Token requested for wrong identity
(Pod) label use: "true" Literal string Required for webhook Token never projected at all

Common subject-mismatch shapes — eyeball this when AADSTS70021 appears:

What you wrote Why it fails Correct form
system:serviceaccount:payments Missing the SA name segment system:serviceaccount:payments:app-sa
system:serviceaccount:Payments:app-sa Namespace is case-sensitive system:serviceaccount:payments:app-sa
system:serviceaccount:payments:app_sa SA name typo (_ vs -) system:serviceaccount:payments:app-sa
serviceaccount:payments:app-sa Missing system: prefix system:serviceaccount:payments:app-sa
Right subject, pod in default ns Pod ran in a different namespace Run pod in payments, or add a 2nd FIC

Author the SecretProviderClass

The SecretProviderClass (SPC) is a namespaced CRD that tells the Azure provider which vault to hit, how to authenticate, and which objects to fetch. For workload identity, the two load-bearing settings are usePodIdentity: "false" and clientID set to the workload identity’s client ID — that combination signals the provider to use the projected service-account token.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: app-kv
  namespace: payments
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    clientID: "00001111-aaaa-2222-bbbb-3333cccc4444"  # workload identity clientId
    keyvaultName: "kv-platform-prod"
    cloudName: ""                                      # defaults to AzurePublicCloud
    tenantId: "ffffffff-1111-2222-3333-444444444444"
    objects: |
      array:
        - |
          objectName: db-connection-string
          objectType: secret
          objectVersion: ""        # empty = latest
        - |
          objectName: signing-key
          objectType: key
          objectVersion: ""
        - |
          objectName: tls-app
          objectType: secret       # full PEM chain + private key
          objectAlias: tls-app.pem

Every parameter, end to end

The full parameters surface for the Azure provider — what each does, its default, and the gotcha:

Parameter Purpose Default When to change Gotcha
usePodIdentity Selects pod-identity (deprecated) path "false" Never for Workload ID Must be the string "false", not bool
useVMManagedIdentity Use node/add-on MI instead of WI "false" Add-on-identity model only Mutually exclusive with clientID WI
clientID Workload identity client ID Always for Workload ID Must match the SA annotation
keyvaultName Vault short name (not URL) Per vault Name only, e.g. kv-platform-prod
cloudName Azure cloud "" → Public Gov / China clouds Leave empty for commercial Azure
tenantId Entra tenant GUID Always Cross-tenant needs the resource tenant
objects Nested YAML of objects to fetch Always A YAML string; | and - | required

The nested objects document is the part that breaks most first attempts. Each item’s fields:

objects item field Purpose Values Empty / default behaviour
objectName Key Vault object name the vault object’s name required
objectType Which object model path secret | key | cert required
objectVersion Pin a specific version a version GUID or "" "" = latest (do this for rotation)
objectAlias Rename the mounted file any filename defaults to objectName
objectEncoding Decode base64 secrets utf-8 | base64 | hex utf-8; use base64 for binary
objectFormat PEM vs PFX for certs pem | pfx pem

A few things worth internalizing:

For certificates, recall the Key Vault object model — the three object types return materially different things:

objectType Returns PEM contents Typical consumer
key Public key only public key block JWT signature verification
cert Certificate only certificate block, no chain, no key Client-cert pinning, display
secret (vs a cert) Private key + full cert chain key block + cert chain Ingress TLS termination

Ingress controllers want the last one. The version-pinning decision, spelled out:

objectVersion value Behaviour Rotation picks up? Use when
"" (empty) Always latest enabled version Yes Default — you want rotation
explicit version GUID Frozen to that version No Compliance freeze, repro of an incident
disabled latest version Mount fails until re-enabled n/a Never intentionally

Mount as a volume and the startup coupling

The driver only fetches secrets when a pod mounts a CSI volume referencing the SPC — no pod, no secret, by design. The sharp consequence: the volume must mount successfully for the pod to start. If the identity is misconfigured or the object does not exist, the pod stays in ContainerCreating and you read the reason from events.

apiVersion: v1
kind: Pod
metadata:
  name: payments-api
  namespace: payments
  labels:
    azure.workload.identity/use: "true"   # required: opt the pod into the webhook
spec:
  serviceAccountName: app-sa              # must match the federated subject
  containers:
    - name: api
      image: ghcr.io/acme/payments-api:1.8.2
      volumeMounts:
        - name: kv
          mountPath: /mnt/secrets-store
          readOnly: true
  volumes:
    - name: kv
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: "app-kv"

Two non-negotiables here: the pod label azure.workload.identity/use: "true" (this makes the webhook inject the projected token and the AZURE_* env vars) and serviceAccountName matching the federated subject. Miss the label and the token is never projected; the provider fails with an auth error. The required volume/pod fields and what each one is for:

Field Required value Purpose If wrong/missing
labels.azure.workload.identity/use "true" Opts pod into the webhook No token projected → auth error
serviceAccountName the federated SA Identity the FIC trusts AADSTS70021 at mount
volumes[].csi.driver secrets-store.csi.k8s.io Selects the CSI driver Volume not recognized
volumeAttributes.secretProviderClass the SPC name Which SPC to use SPC not found, mount fails
volumeMounts[].readOnly true Secrets are read-only Write attempts fail
volumeMounts[].mountPath e.g. /mnt/secrets-store Where files appear App looks in the wrong place

The secrets land as files:

kubectl exec -n payments payments-api -- ls /mnt/secrets-store/
# db-connection-string  signing-key  tls-app.pem

If the pod is stuck, the events tell you exactly which gate failed — and these strings map one-to-one to a cause:

Event substring (kubectl describe pod) Root cause Fix
failed to get key vault token / no matching federated identity FIC subject/issuer mismatch Fix --subject, re-create the FIC
does not have secrets get permission Missing/incorrect Key Vault RBAC Assign the right role at vault scope
Secret not found / SecretNotFound objectName not in the vault Correct the name or create the object
MountVolume.SetUp failed (provider) Provider DaemonSet unhealthy Check aks-secrets-store-provider-azure
client-id ... not found SA annotation client-id wrong Match annotation to UAMI clientId
Pod Running but no token env Missing the use: "true" label Add the label, recreate the pod

Sync mounted objects into a native Kubernetes Secret

Files on a volume suit apps that read from disk, but most apps want env vars, and env vars come from a Kubernetes Secret. The driver mirrors mounted content into a real Secret via the secretObjects block. Add it to the same SPC:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: app-kv
  namespace: payments
spec:
  provider: azure
  secretObjects:
    - secretName: app-db
      type: Opaque
      data:
        - objectName: db-connection-string   # must match the mounted FILENAME
          key: DB_CONNECTION_STRING           # key inside the K8s Secret
  parameters:
    # ... unchanged from the SPC above ...

The secretObjects schema, field by field:

secretObjects field Purpose Values Gotcha
secretName Name of the K8s Secret created any DNS-1123 name Must be unique per namespace
type Kubernetes Secret type Opaque | kubernetes.io/tls | kubernetes.io/dockerconfigjson TLS type needs both tls.crt+tls.key
data[].objectName The mounted filename alias if set, else objectName NOT the Key Vault name — the #1 trap
data[].key Key inside the K8s Secret any key name This is what secretKeyRef references
labels / annotations Metadata on the Secret maps Useful for Reloader auto-discovery

The critical rule: objectName under secretObjects.data must match the mounted filename, which is objectAlias if you set one, otherwise objectName. People wire this to the Key Vault object name and get an empty Secret. The exact mapping chain, so you can trace it end to end:

Stage Value in the example Determined by
Key Vault object tls-app The vault
objects item objectName tls-app Your SPC
objects item objectAlias tls-app.pem Your SPC (renames the file)
Mounted filename tls-app.pem The alias (or objectName if no alias)
secretObjects.data.objectName tls-app.pem Must equal the mounted filename
secretObjects.data.key tls.crt The K8s Secret key your app reads

Now consume it as an env var:

      env:
        - name: DB_CONNECTION_STRING
          valueFrom:
            secretKeyRef:
              name: app-db
              key: DB_CONNECTION_STRING

Two lifecycle facts that are easy to miss and cause incidents:

  1. The synced Secret only exists after at least one pod mounts the volume. It is created on first mount, not on SPC apply. Anything that reads the Secret before that first pod (a Helm pre-install hook, another Deployment) gets a not-found.
  2. The synced Secret is garbage-collected when the last consuming pod is deleted. The driver owns its lifecycle. Do not point unrelated workloads at it expecting it to persist independently.

The supported synced-Secret types and when to reach for each:

Synced Secret type Required keys Produced from Consumer
Opaque any one secret per key env vars, generic file config
kubernetes.io/tls tls.crt, tls.key objectType: secret against a cert Ingress TLS
kubernetes.io/dockerconfigjson .dockerconfigjson a secret holding registry JSON imagePullSecrets
kubernetes.io/basic-auth username, password two secrets basic-auth middleware

Enable auto-rotation, tune the poll interval, and understand propagation

Rotation is off by default. It is an add-on-level setting, not an SPC field. Enable it and optionally widen the poll interval:

az aks addon update \
  --resource-group $RESOURCE_GROUP \
  --name $CLUSTER_NAME \
  --addon azure-keyvault-secrets-provider \
  --enable-secret-rotation \
  --rotation-poll-interval 5m

The default poll interval is 2 minutes. The driver polls every interval, and on a detected change updates both the mounted file content and the synced Kubernetes Secret. Now the part nobody reads carefully — what actually reaches the application:

Consumption pattern Picks up rotation automatically? What you must do Lag
App reads the mounted file Yes, on next poll App must re-read the file (watch or re-open per request) ≤ 1 poll + sync
App reads the synced Secret as a volume Yes, on next poll Mounted Secret volume contents update in place ≤ 1 poll + kubelet sync
App reads the synced Secret as an env var No Restart the pod (env is a start-time snapshot) until restart
App reads via subPath mount No Remove subPath; mount the whole volume never (kubelet limitation)

That env-var row is the trap. Environment variables are a point-in-time snapshot taken at container start. Rotating the secret updates the Kubernetes Secret, but a running container’s env block never changes. To close the loop, run something like Reloader, which watches synced Secrets and triggers a rolling restart:

metadata:
  annotations:
    reloader.stakater.com/auto: "true"

There is also a known Kubernetes limitation orthogonal to all of this: a Secret or ConfigMap mounted via subPath does not receive updates — that is a kubelet behavior, not a driver bug. Mount the whole volume, not a subPath, if you want in-place updates.

The poll-interval trade-off — faster polling is fresher but hits the Key Vault data plane more often:

Poll interval Freshness (worst-case lag) Key Vault API pressure Use for
30s (minimum practical) ~30s + sync Highest — watch throttling at scale Tight rotation SLAs, few pods
2m (default) ~2m + sync Moderate Most workloads
5m ~5m + sync Low Large fleets, relaxed SLAs
30m+ ~30m + sync Minimal Rarely-rotated secrets, cost-sensitive

Key Vault throttling is real — the data plane has request limits, and a large fleet polling aggressively can hit them:

Pressure source Symptom Confirm Mitigation
Too-short poll × many pods 429 Too Many Requests in provider logs Provider DaemonSet logs; KV metrics Widen poll interval; fewer objects per SPC
Many distinct vaults polled Aggregate request rate high KV ServiceApiHit metric Consolidate or stagger
Large objects arrays Each poll fetches all objects Count objects per SPC Split SPCs; pin rarely-changing objects

Set realistic expectations on lag. Worst-case time from a Key Vault write to a file update is roughly one poll interval plus the kubelet’s atomic-write sync (usually under a minute). With env vars and Reloader, add the rollout time. Do not assume sub-second rotation; design for “new value is live within a poll interval, connections re-established on next use.”

TLS certificate consumption for ingress

A common goal is terminating TLS at an ingress controller with a cert that lives in Key Vault and rotates automatically. The pattern: mount objectType: secret against the certificate name (yielding the full PEM chain plus private key), sync it into a kubernetes.io/tls Secret, and point the Ingress at that Secret. This is the ingress-side complement to Kubernetes ingress controllers, TLS and routing.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: app-tls
  namespace: ingress
spec:
  provider: azure
  secretObjects:
    - secretName: app-tls
      type: kubernetes.io/tls
      data:
        - objectName: tls-app.pem      # the mounted filename (objectAlias below)
          key: tls.crt
        - objectName: tls-app.pem
          key: tls.key
  parameters:
    usePodIdentity: "false"
    clientID: "00001111-aaaa-2222-bbbb-3333cccc4444"
    keyvaultName: "kv-platform-prod"
    tenantId: "ffffffff-1111-2222-3333-444444444444"
    objects: |
      array:
        - |
          objectName: app-cert
          objectType: secret           # full chain + key
          objectAlias: tls-app.pem

A kubernetes.io/tls Secret requires both tls.crt and tls.key. The provider splits the PEM bundle when you map both keys to the same mounted object, because objectType: secret against a certificate returns the concatenated private key and cert chain. The cert-to-TLS-Secret decisions you must get right:

Decision Right choice Wrong choice does Why
Object type for the cert secret cert returns no key → no tls.key TLS needs the private key
objectFormat pem pfx won’t split into crt/key cleanly Ingress expects PEM
Both keys → same mounted file tls.crt + tls.key from tls-app.pem mapping different files Provider splits one bundle
Keep a pod mounting the SPC alive a keeper/pause pod letting all consumers die GC removes the synced TLS Secret
Rotation on --enable-secret-rotation off → cert never refreshes Auto-renewed certs must propagate

Keep a pod mounting the SPC alive so the synced TLS Secret is never garbage-collected out from under the ingress controller — this is why teams run a tiny pause-style keeper pod alongside the controller. The garbage-collection trap, made explicit:

Scenario Synced TLS Secret state Ingress result
≥1 pod mounts the SPC Secret present, fresh TLS serves normally
All mounting pods deleted/evicted Secret garbage-collected Ingress 503 / cert disappears
Keeper pod holds the mount Secret persists independently of app TLS stable across app rollouts
Cert rotates, keeper alive Secret updated in place on next poll Ingress picks up new cert

Architecture at a glance

Read the diagram left to right as the token-and-secret path the way the provider walks it at mount time. On the far left, a labelled application pod in the payments namespace runs under the app-sa service account. Because the pod carries azure.workload.identity/use: "true", the AKS workload-identity webhook projects a short-lived OIDC token into the pod and injects the AZURE_* environment variables. The pod mounts a CSI volume that points at the app-kv SecretProviderClass. That mount is the trigger for everything downstream — with no pod and no mount, nothing here fires.

In the middle zone live the managed add-on components in kube-system: the Secrets Store CSI driver DaemonSet and the Azure provider DaemonSet. When the volume mounts, the provider takes the pod’s projected token and performs the federated exchange against Entra — the federated identity credential on the user-assigned managed identity says “this OIDC issuer plus system:serviceaccount:payments:app-sa equals this identity,” so Entra returns an access token with no client secret ever involved. The provider then calls the Key Vault data plane over HTTPS 443, authorized by the identity’s RBAC role (Key Vault Secrets User for a secret, Certificate User for a key/cert). The returned objects are written to a tmpfs mount at /mnt/secrets-store, and — if secretObjects is set — mirrored into a native Kubernetes Secret for env-var consumption. With rotation enabled, the driver re-polls the vault every interval (default 2m) and rewrites the file and synced Secret on change. The numbered badges mark the exact hops where this path breaks, and the legend narrates each as symptom · confirm · fix.

AKS Secrets Store CSI architecture: a labelled pod with workload-identity token projection mounts a CSI volume; the CSI driver and Azure provider DaemonSets in kube-system perform a federated token exchange against Entra using a user-assigned managed identity's federated credential, then read secrets, keys and certificates from the Key Vault data plane over HTTPS 443 and write them to a tmpfs mount and a synced Kubernetes Secret, with a rotation poll loop refreshing on a 2-minute interval; numbered badges mark the federation, RBAC, sync-mapping, env-var-staleness and garbage-collection failure points

The five badges map to the five things that actually go wrong in production: a federation subject mismatch (AADSTS70021) at the Entra exchange, a wrong RBAC role (403) at the vault, an objectName-vs-filename mismatch that yields an empty synced Secret, an env-var consumer that never sees a rotation, and a garbage-collected synced TLS Secret when the last consuming pod dies. Hold this picture and every later failure mode has a home.

Real-world scenario

A payments platform team ran a 40-node AKS cluster shared by eight product squads. They had standardized early on the add-on’s auto-created managed identity, granting it Key Vault Secrets User on a single shared vault. It worked — and it became a finding in their PCI assessment. Because the credential was node-scoped, every pod on every node could read every secret in the vault. The blast radius of one compromised pod was the entire secret store, and the audit log could not attribute a secret get to a workload, since all reads came from one identity.

The constraint: they could not split into per-team clusters (cost and operational load), nor take a maintenance window long enough to re-platform. They needed per-squad isolation on the existing cluster, with auditable, attributable Key Vault access, and zero downtime.

The fix was a migration to workload identity, one squad at a time. They enabled the OIDC issuer and workload-identity webhook in place (az aks update --enable-oidc-issuer --enable-workload-identity — a control-plane reconcile; pods kept running). Each squad got its own user-assigned identity, its own vault, and a federated credential bound to that squad’s service account:

# Per squad: scope the identity to exactly its service account + its vault
az identity create --name id-squad-payments --resource-group rg-platform
CID=$(az identity show -g rg-platform -n id-squad-payments --query clientId -o tsv)

az role assignment create --role "Key Vault Secrets User" \
  --assignee "$CID" \
  --scope "$(az keyvault show -n kv-squad-payments --query id -o tsv)"

az identity federated-credential create \
  --name fic-payments --identity-name id-squad-payments --resource-group rg-platform \
  --issuer "$AKS_OIDC_ISSUER" \
  --subject system:serviceaccount:payments:payments-sa \
  --audience api://AzureADTokenExchange

The migration was reversible per workload: they kept the add-on identity’s role assignment until each squad’s pods rolled over to the new SPC (usePodIdentity: "false", clientID set to the squad’s identity) and verified mounts, then revoked the shared assignment last. Because rotation was already on, no app-side change was needed on the data path — only the identity backing the mount changed.

What the migration cost and bought, squad by squad:

Phase Action Downtime Risk Rollback
0 Enable OIDC + WI webhook None Control-plane reconcile only Disable flags
1 Per-squad UAMI + vault + FIC None New resources, no traffic yet Delete the resources
2 Roll one squad’s pods to new SPC Rolling, none Mount could fail → caught in canary Roll back the Deployment
3 Verify per-squad vault logs attribute reads None Observation only n/a
4 Revoke shared add-on identity role last None Only after all squads migrated Re-add the role assignment

The finding closed: Key Vault diagnostic logs now attributed every SecretGet to a named per-squad identity, and a compromised pod could reach exactly one squad’s secrets. The before/after in numbers:

Property Before (shared add-on identity) After (per-squad Workload ID)
Identities reading the vault 1 (node-scoped) 8 (one per squad)
Blast radius of one popped pod All secrets, all squads That squad’s secrets only
Audit attribution Unattributable Per-squad in KV logs
Client secrets stored 0 (already MI) 0 (federated)
Downtime to migrate n/a Zero

Advantages and disadvantages

The honest two-column trade-off:

Advantages Disadvantages
Secrets never in git or (by default) etcd Mount coupled to a running pod — no pod, no secret
Passwordless: federated, no client secret Steeper setup than kubectl create secret
Per-SA identity → least privilege + audit Subject-mismatch errors surface only at pod start
tmpfs mount: memory-backed, not on disk Env-var consumers need Reloader to see rotation
Auto-rotation pulls new versions on a poll Rotation lag = poll interval + sync (not instant)
First-party AKS add-on, platform-maintained Synced Secret GC’d when last consumer dies
Works for secrets, keys, and TLS certs RBAC role per object type is non-intuitive
Centralized rotation/versioning in Key Vault Aggressive polling can throttle the vault data plane

When each side matters: choose the CSI driver whenever you need auditable, attributable, rotatable secret access on a shared cluster — the disadvantages are operational learning curves, not architectural dead-ends. Prefer plain Kubernetes Secrets only for throwaway dev clusters where audit and rotation are non-goals. For env-var-heavy apps where you cannot adopt Reloader, lean on file or mounted-Secret-volume consumption so rotation reaches the app without a restart. Where to use which model:

Situation Use Why
Multi-squad shared cluster, audit required Workload ID + CSI Per-SA attribution and least privilege
Single-tenant demo cluster Add-on identity + CSI (or plain Secret) Speed over isolation
App reads config from files CSI mounted file Rotation reaches it on next read
App reads config from env vars CSI sync + Reloader Restart on rotation
Ingress TLS from a rotating cert CSI sync to kubernetes.io/tls + keeper pod Cert propagates, Secret survives GC
Crypto operations (sign/verify) Key Vault SDK directly, not CSI Provider doesn’t do crypto ops

Hands-on lab

A copy-pasteable walk-through. It assumes an existing AKS cluster and a Key Vault you can grant RBAC on. Costs are negligible (a UAMI is free; a vault secret is fractions of a paisa per 10k operations).

# 0. Variables
export RESOURCE_GROUP=rg-lab
export CLUSTER_NAME=aks-lab
export KEYVAULT_NAME=kv-lab-$RANDOM
export UAMI=id-lab-secrets
export SA_NAME=demo-sa
export SA_NAMESPACE=demo

# 1. Enable the add-on + Workload ID prerequisites (idempotent)
az aks update -g $RESOURCE_GROUP -n $CLUSTER_NAME \
  --enable-oidc-issuer --enable-workload-identity
az aks enable-addons -g $RESOURCE_GROUP -n $CLUSTER_NAME \
  --addons azure-keyvault-secrets-provider
az aks get-credentials -g $RESOURCE_GROUP -n $CLUSTER_NAME --overwrite-existing

# 2. Create the vault (RBAC mode) and a test secret
az keyvault create -g $RESOURCE_GROUP -n $KEYVAULT_NAME --enable-rbac-authorization true
az role assignment create --role "Key Vault Secrets Officer" \
  --assignee "$(az ad signed-in-user show --query id -o tsv)" \
  --scope "$(az keyvault show -n $KEYVAULT_NAME --query id -o tsv)"
az keyvault secret set --vault-name $KEYVAULT_NAME --name db-connection-string \
  --value "Server=db;Pwd=initial"

# 3. Create the workload identity and grant it READ on the vault
az identity create -g $RESOURCE_GROUP -n $UAMI
export CID=$(az identity show -g $RESOURCE_GROUP -n $UAMI --query clientId -o tsv)
export TID=$(az aks show -g $RESOURCE_GROUP -n $CLUSTER_NAME --query identity.tenantId -o tsv)
az role assignment create --role "Key Vault Secrets User" --assignee "$CID" \
  --scope "$(az keyvault show -n $KEYVAULT_NAME --query id -o tsv)"

# 4. Service account + federated credential
export ISSUER=$(az aks show -g $RESOURCE_GROUP -n $CLUSTER_NAME \
  --query oidcIssuerProfile.issuerUrl -o tsv)
kubectl create namespace $SA_NAMESPACE
kubectl create serviceaccount $SA_NAME -n $SA_NAMESPACE
kubectl annotate serviceaccount $SA_NAME -n $SA_NAMESPACE \
  azure.workload.identity/client-id=$CID
az identity federated-credential create --name fic-lab --identity-name $UAMI \
  -g $RESOURCE_GROUP --issuer "$ISSUER" \
  --subject system:serviceaccount:$SA_NAMESPACE:$SA_NAME \
  --audience api://AzureADTokenExchange
# 5. SecretProviderClass (note clientID, tenantId, and the sync block)
cat <<EOF | kubectl apply -f -
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: lab-kv
  namespace: $SA_NAMESPACE
spec:
  provider: azure
  secretObjects:
    - secretName: lab-db
      type: Opaque
      data:
        - objectName: db-connection-string
          key: DB_CONNECTION_STRING
  parameters:
    usePodIdentity: "false"
    clientID: "$CID"
    keyvaultName: "$KEYVAULT_NAME"
    tenantId: "$TID"
    objects: |
      array:
        - |
          objectName: db-connection-string
          objectType: secret
          objectVersion: ""
EOF

# 6. A pod that mounts the volume AND reads the synced Secret as env
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: demo
  namespace: $SA_NAMESPACE
  labels:
    azure.workload.identity/use: "true"
spec:
  serviceAccountName: $SA_NAME
  containers:
    - name: app
      image: mcr.microsoft.com/azure-cli:latest
      command: ["sleep","3600"]
      env:
        - name: DB_CONNECTION_STRING
          valueFrom:
            secretKeyRef: { name: lab-db, key: DB_CONNECTION_STRING }
      volumeMounts:
        - name: kv
          mountPath: /mnt/secrets-store
          readOnly: true
  volumes:
    - name: kv
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: "lab-kv"
EOF
# 7. Verify the mount, the synced Secret, and env injection
kubectl wait --for=condition=Ready pod/demo -n $SA_NAMESPACE --timeout=120s
kubectl exec -n $SA_NAMESPACE demo -- ls /mnt/secrets-store/
kubectl exec -n $SA_NAMESPACE demo -- cat /mnt/secrets-store/db-connection-string; echo
kubectl get secret lab-db -n $SA_NAMESPACE -o jsonpath='{.data.DB_CONNECTION_STRING}' | base64 -d; echo
kubectl exec -n $SA_NAMESPACE demo -- printenv DB_CONNECTION_STRING

# 8. Prove rotation reaches the FILE but NOT the env var
az aks addon update -g $RESOURCE_GROUP -n $CLUSTER_NAME \
  --addon azure-keyvault-secrets-provider --enable-secret-rotation
az keyvault secret set --vault-name $KEYVAULT_NAME --name db-connection-string \
  --value "Server=db;Pwd=ROTATED-$(date +%s)"
sleep 150   # ~ one default poll interval + sync slack
kubectl exec -n $SA_NAMESPACE demo -- cat /mnt/secrets-store/db-connection-string; echo  # NEW value
kubectl exec -n $SA_NAMESPACE demo -- printenv DB_CONNECTION_STRING                       # OLD value (snapshot)

# 9. Teardown
kubectl delete namespace $SA_NAMESPACE
az identity federated-credential delete --name fic-lab --identity-name $UAMI -g $RESOURCE_GROUP --yes
az identity delete -g $RESOURCE_GROUP -n $UAMI
az keyvault delete -n $KEYVAULT_NAME -g $RESOURCE_GROUP
az keyvault purge -n $KEYVAULT_NAME   # if purge protection / soft-delete retains it

Expected outputs at each verify step:

Step Command Expected result
7 ls /mnt/secrets-store/ db-connection-string
7 cat .../db-connection-string Server=db;Pwd=initial
7 get secret lab-db ... base64 -d Server=db;Pwd=initial
7 printenv DB_CONNECTION_STRING Server=db;Pwd=initial
8 cat .../db-connection-string (post-rotate) Server=db;Pwd=ROTATED-... (NEW)
8 printenv DB_CONNECTION_STRING (post-rotate) Server=db;Pwd=initial (OLD — the lesson)

Step 8 is the whole point of the lab: the file updates, the env var does not. To make env update too, annotate the Deployment for Reloader and let it restart the pod.

Common mistakes & troubleshooting

The differentiator. Every failure mode here is one we have actually hit. Read the playbook table top to bottom mid-incident; the prose under it explains the gnarly ones.

# Symptom Root cause Confirm (exact command/path) Fix
1 Pod stuck ContainerCreating Volume mount failing for some reason kubectl describe pod <p> -n <ns> → Events Read the event string; jump to the matching row below
2 Event: No matching federated identity record found / AADSTS70021 FIC --subject or issuer mismatch az identity federated-credential list --identity-name <uami> -g <rg> Recreate FIC with exact system:serviceaccount:<ns>:<sa> and the cluster issuer
3 Event: does not have secrets get permission (403) Wrong/missing Key Vault RBAC az role assignment list --assignee <cid> --scope <kv-id> Assign correct role (Secrets vs Certificate User) at vault scope
4 Event: Secret not found objectName not in the vault (typo/wrong vault) az keyvault secret show --vault-name <kv> --name <obj> Fix the name or create the object
5 Synced Secret exists but data: {} (empty) secretObjects.data.objectName ≠ mounted filename Compare SPC value to kubectl exec ... ls /mnt/secrets-store Set it to the alias (or objectName if no alias)
6 Synced Secret never appears No pod has mounted the SPC yet kubectl get secretproviderclasspodstatus -n <ns> Deploy a pod that mounts the volume
7 Synced Secret vanished; ingress 503 GC’d — last consuming pod deleted kubectl get pods -n <ns> referencing the SPC Run a keeper pod that holds the mount
8 Rotated value never reaches app App reads env var (start-time snapshot) printenv shows old; file shows new Use file/Secret-volume consumption, or Reloader to restart
9 Rotation never updates a mounted file Rotation disabled, or subPath mount az aks show ... config; check subPath in the volume Enable rotation; remove subPath
10 Auth fails despite correct FIC Pod missing azure.workload.identity/use: "true" kubectl get pod <p> -o yaml | grep workload.identity Add the label; recreate the pod
11 clientID ... not found SA annotation client-id wrong kubectl get sa <sa> -n <ns> -o yaml vs az identity show Match annotation to the UAMI clientId
12 429 Too Many Requests from Key Vault Poll interval too short for fleet size Provider DaemonSet logs; KV ServiceApiHit Widen --rotation-poll-interval; split SPCs
13 TLS Secret empty / ingress handshake fails objectType: cert (no key) instead of secret Inspect SPC objects and synced Secret keys Use objectType: secret for full chain + key
14 SPC apply errors / object silently skipped Malformed nested objects YAML (- |) kubectl get spc <name> -o yaml; re-validate indentation Fix the block scalar; one - | per item
15 Cross-tenant vault AADSTS error tenantId is the cluster tenant, not the vault’s az keyvault show ... --query properties.tenantId Set tenantId to the vault’s resource tenant

The three that burn the most hours

Subject mismatch (#2). The FIC error appears at pod start, never at apply — because the binding is only exercised when a token is actually exchanged. The subject is case-sensitive and has a fixed shape: system:serviceaccount:<namespace>:<serviceaccount>. Re-read the subject-mismatch table shapes — a _ for a -, a wrong namespace, or a missing system: prefix all produce the identical AADSTS70021. Confirm with az identity federated-credential list and compare byte-for-byte to your pod’s serviceAccountName and namespace.

Empty synced Secret (#5). You apply the SPC, the pod runs, the file is on the mount — but kubectl get secret app-db -o jsonpath='{.data}' returns {}. The cause is almost always that secretObjects.data.objectName was set to the Key Vault object name when it must be the mounted filename. If you set objectAlias: tls-app.pem, the file is tls-app.pem and the sync must reference tls-app.pem, not tls-app. Trace the mapping chain: vault name → objectNameobjectAlias → mounted filename → secretObjects.data.objectName.

Env-var staleness (#8). Everything looks healthy — the vault has the new value, the file has the new value, the synced Secret has the new value — yet the app behaves on the old credential. Environment variables are injected once at container start and never change. Confirm by comparing cat /mnt/secrets-store/<file> (new) to printenv <VAR> (old). The fix is structural: read from the file or a mounted Secret volume, or annotate the workload for Reloader so a rotation triggers a rolling restart.

Best practices

Security notes

The CSI driver’s security value is that secret material never touches git, never (by default) touches etcd, and lands only on a memory-backed tmpfs mount — but you still have to wire identity and network correctly. The controls that matter, with the setting that enforces each:

Control Why it matters How to enforce Verify
Per-SA Workload ID Least privilege, audit attribution UAMI + FIC per service account Vault logs show per-SA SecretGet
Minimal RBAC role Limits what a popped pod can read Secrets/Certificate User at vault scope az role assignment list --scope <kv>
No K8s sync unless needed Keeps secrets out of etcd Omit secretObjects for file-only apps kubectl get secret shows none synced
tmpfs mount Secrets never written to node disk Default behaviour of the driver Mount is memory-backed
Vault firewall / Private Endpoint Vault not reachable from the internet KV network rules + PE to the cluster subnet az keyvault show ... networkAcls
Soft-delete + purge protection Recover a deleted/rotated secret Vault properties az keyvault show ... enableSoftDelete
Pod label gating Only intended pods get a token azure.workload.identity/use: "true" Absent label → no token projected
Read-only mount Pod cannot tamper with secrets readOnly: true on the volume Write attempts fail

Network isolation deserves emphasis: put the vault behind a Private Endpoint into the cluster’s subnet and set networkAcls to deny public traffic, so even a leaked identity cannot reach the vault from outside your network. Combine with Entra managed identities and federated credentials hardening — short-lived federated tokens with no stored secret are the baseline, and Conditional Access on the workload identity raises the bar further. The least-privilege ladder, from worst to best:

Posture Identity Scope Verdict
Worst Add-on identity, shared Whole vault Unattributable, max blast radius
Better Per-SA UAMI Whole vault Attributable, still broad
Good Per-SA UAMI Per-secret scope Least privilege per workload
Best Per-SA UAMI + PE-only vault + CA Per-secret scope Network + identity + policy defence-in-depth

Cost & sizing

The CSI driver itself is free — it is a first-party add-on with no licence cost. What you pay for is Key Vault operations and a trivial amount of node resource for the DaemonSets. The cost drivers and rough figures (INR at ~₹84/USD, indicative):

Cost driver Unit Rough cost Notes
Secrets Store CSI add-on per cluster ₹0 (free) First-party AKS add-on
Key Vault operations per 10,000 transactions ~₹2.5 (~$0.03) Each poll fetches each object = transactions
Key Vault (Standard tier) per vault no base fee; pay per op Premium adds HSM-backed keys
Key Vault (Premium HSM) per HSM key/month ~₹85 (~$1) + ops Only if you need HSM-backed keys
UAMI per identity ₹0 (free) No charge for managed identities
DaemonSet CPU/memory per node negligible Tiny footprint per node
NAT/Private Endpoint (optional) per endpoint/hour ~₹0.8/hr (~$0.01) If you isolate the vault network

The variable that actually moves the bill is poll frequency × object count × pod count, because each poll fetches each object as a billable transaction. A worked example:

Scenario Pods Objects/SPC Poll interval Transactions/day Rough cost/day
Small app 3 2 2m ~12,960 ~₹3
Medium fleet 50 3 2m ~324,000 ~₹81
Large fleet, tight poll 200 4 30s ~6.9M ~₹1,725
Large fleet, relaxed poll 200 4 5m ~691,200 ~₹173

The right-sizing levers, in order of impact: widen the poll interval to your real rotation SLA (a 5m interval is 2.5× cheaper than 2m and usually fine), pin rarely-changing objects to a version so they are not re-fetched every poll, and split SPCs so unrelated objects do not all poll on the same cadence. The Standard tier has no base fee; reach for Premium only when you genuinely need HSM-backed keys. There is no free-tier limit to worry about here — the per-transaction cost is small, and the failure mode at the high end is throttling (429), not a surprise invoice.

Interview & exam questions

Q1. Why does the add-on always create a managed identity even when you intend to use Workload ID? The add-on provisions a user-assigned managed identity in the node resource group and assigns it to the VMSS as part of installation; you cannot suppress it. You simply do not use it — your SPC points at a different workload identity via clientID and usePodIdentity: "false". Maps to AZ-500/CKS identity topics.

Q2. A pod is stuck in ContainerCreating with AADSTS70021. What is wrong and how do you confirm? The federated credential’s subject (or issuer) does not match the pod’s system:serviceaccount:<ns>:<sa> and the cluster OIDC issuer. Confirm with az identity federated-credential list and compare to the pod’s namespace and serviceAccountName. The error appears at pod start, not at apply.

Q3. You mount an objectType: secret against a Key Vault certificate. What do you get, and what RBAC role is required? You get the full PEM chain plus the private key (the cert’s secret backing), which requires Key Vault Secrets User. By contrast, key/cert object types go through the certificate path and need Key Vault Certificate User.

Q4. The synced Kubernetes Secret exists but is empty. Most likely cause? secretObjects.data.objectName was set to the Key Vault object name instead of the mounted filename. If an objectAlias is set, the filename is the alias and the sync must reference it. Fix the mapping.

Q5. You rotate a secret in Key Vault. The mounted file updates but the app keeps using the old value. Why? The app reads the secret as an environment variable, which is a snapshot taken at container start and never changes. Switch to file or mounted-Secret-volume consumption, or use Reloader to restart the pod on Secret change.

Q6. Why does a subPath-mounted Secret not pick up rotation? It is a kubelet limitation: subPath volume mounts do not receive in-place updates. Mount the whole volume instead of a subPath.

Q7. What happens to the synced Secret when the last pod consuming it is deleted, and why does it matter for ingress? The driver garbage-collects it. An ingress controller depending on a synced kubernetes.io/tls Secret will lose its cert and start failing. Run a keeper pod that holds the mount so the Secret persists.

Q8. Where do secrets land on the node, and what is the significance for etcd? On a tmpfs (memory-backed) mount, not on disk. Unless you opt into the K8s sync via secretObjects, nothing is written to etcd — which is the security win over plain Kubernetes Secrets.

Q9. How do you make an env-var-consuming app pick up rotation automatically? You cannot change a running container’s env. Annotate the workload with reloader.stakater.com/auto: "true" (running Reloader) so a change to the synced Secret triggers a rolling restart that re-injects the new value.

Q10. Your fleet started getting 429 from Key Vault after enabling rotation. What changed and how do you fix it? Each poll fetches each object as a transaction; a short poll interval across many pods/objects exceeds the data-plane request limit. Widen --rotation-poll-interval, split SPCs, and pin rarely-changing objects to reduce per-poll fetches.

Q11. Which two SPC parameters signal Workload ID specifically, and what value must each take? usePodIdentity: "false" and clientID set to the workload identity’s clientId (a string). That combination tells the provider to use the projected service-account token.

Q12. Why is the CSI mount fundamentally pull-based, and what is the operational consequence? The provider only fetches at mount time (and on rotation polls). The consequence: a secret’s availability is coupled to a running pod and a working identity — no pod means no Secret, and a bad identity means the pod never starts.

Quick check

  1. You want rotation to reach your app without a restart. Which consumption pattern do you choose, and which one do you avoid?
  2. Your federated credential --subject reads system:serviceaccount:payments. What error will you see and when?
  3. You mount objectType: cert and your ingress TLS handshake fails. What did you do wrong?
  4. The synced Secret is empty though the file is present on the mount. What single field is misconfigured?
  5. An ingress controller’s TLS Secret disappeared after a deploy. What lifecycle behaviour caused it and what is the fix?

Answers

  1. Read from the mounted file or a mounted-Secret volume (both update in place on the next poll); avoid env vars, which are a start-time snapshot and never change in a running container.
  2. AADSTS70021: No matching federated identity record found, at pod start (not at az/apply time) — the subject is missing the SA-name segment; it must be system:serviceaccount:payments:<sa>.
  3. You used objectType: cert, which returns the certificate only with no private key, so there is no tls.key. Use objectType: secret against the cert to get the full PEM chain plus key.
  4. secretObjects.data.objectName — it must equal the mounted filename (the objectAlias if set, otherwise objectName), not the Key Vault object name.
  5. The synced Secret is garbage-collected when the last consuming pod is deleted. During the deploy all consuming pods rolled out and the Secret vanished. Run a keeper pod that holds the SPC mount so the Secret persists across rollouts.

Glossary

Next steps

akskey-vaultcsisecretsworkload-identity
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments