Identity Azure

Managed Identities Deep Dive: User-Assigned Identities, Federated Credentials, and RBAC Patterns for Azure Workloads

Almost every “the pipeline can’t reach Key Vault” incident I get pulled into ends the same way: a system-assigned identity that got recreated on a redeploy, taking every role assignment with it. Managed identities are the right answer to “stop putting secrets in app settings,” but the defaults push teams toward the brittle pattern. This is the architecture I deploy instead — user-assigned identities as first-class, reusable resources, federated identity credentials so external workloads never hold an Azure secret, and RBAC scoped tightly enough that a compromised workload can’t pivot. It assumes you can create role assignments (Owner, User Access Administrator, or Role Based Access Control Administrator at the relevant scope) and a basic grasp of Entra service principals.

1. System-assigned vs user-assigned: lifecycle and blast radius

Both create a service principal that Azure manages — no credential ever lands in your code. The difference is lifecycle, and that dictates your blast radius.

Property System-assigned User-assigned
Lifecycle Tied to the parent resource; deleted with it Standalone Azure resource (Microsoft.ManagedIdentity/userAssignedIdentities)
Reuse One identity, one resource One identity, many resources
Role assignments survive redeploy No — SP is recreated, assignments lost Yes — identity and its principalId persist
Federated credentials Not supported Supported (this is the big one)
Cleanup risk Self-cleaning Orphans if not governed

A system-assigned identity feels simpler, but its principalId is regenerated whenever the resource is recreated — and role assignments reference the principal ID, not the resource. Recreate a Function App and every Key Vault Secrets User grant silently evaporates: a clean 403 at runtime, because the assignment in IaC “exists” but points at a principal that’s gone.

User-assigned identities decouple the identity from its consumers. Create it once, in its own resource group, then attach it to the VM, AKS pod, or Function App. Redeploy the compute all day; the principalId and its grants stay put.

# Create a user-assigned identity as a standalone, reusable resource
az identity create \
  --name id-payments-prod \
  --resource-group rg-identity-prod \
  --location eastus2

# Capture the three values you will reference everywhere
CLIENT_ID=$(az identity show -n id-payments-prod -g rg-identity-prod --query clientId -o tsv)
PRINCIPAL_ID=$(az identity show -n id-payments-prod -g rg-identity-prod --query principalId -o tsv)
RESOURCE_ID=$(az identity show -n id-payments-prod -g rg-identity-prod --query id -o tsv)

Blast-radius rule I enforce: one identity per workload (a deployable unit with a single owner), not per resource and not spanning environments. Sharing across prod and non-prod means a non-prod compromise carries prod roles; per-resource means ten identities for one app and no reuse. Per-workload is the sweet spot.

2. How the token flow works: IMDS, the metadata endpoint, and caching

When code on an Azure resource asks for a token, it does not call Entra directly. It calls the Azure Instance Metadata Service (IMDS) at the non-routable 169.254.169.254, passing a Metadata: true header. The platform injects the identity’s credential into that local endpoint; your code never sees a secret. Two query params matter: resource is the audience (https://vault.azure.net for Key Vault, https://storage.azure.com for Storage, https://management.azure.com for ARM), and client_id disambiguates when more than one user-assigned identity is attached and IMDS can’t guess which you mean (you can also use principal_id or mi_res_id).

# IMDS token request, pinned to a specific user-assigned identity
curl -s -H "Metadata: true" \
  "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://vault.azure.net&client_id=${CLIENT_ID}"

The endpoint returns a JWT in access_token plus an expires_on. In real code you never hand-roll this: DefaultAzureCredential walks a chain (env vars, managed identity, Azure CLI) so the same code runs locally and in Azure. The footgun: with multiple identities attached, DefaultAzureCredential with no managed_identity_client_id is ambiguous and fails — always pin the client ID.

from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

# Pin the user-assigned identity explicitly — never rely on the default
# when more than one identity could be attached.
credential = DefaultAzureCredential(
    managed_identity_client_id="00000000-0000-0000-0000-000000000000"
)
client = SecretClient(vault_url="https://kv-payments.vault.azure.net", credential=credential)
secret = client.get_secret("db-connection-string")

Caching behavior you have to plan for. The SDK caches the access token in memory until close to expiry, so the next IMDS call is cheap — but those tokens can live up to ~24 hours. After granting a role, a long-lived process may keep using a cached token that predates the change, so it needs a refresh or restart before it sees the new access. Short-lived workers and request-scoped credentials sidestep this entirely.

3. Assigning least-privilege RBAC and scoping to specific resources

The identity is worthless until it has a role at a scope. The mistake I see most is granting at subscription or resource-group level “to keep it simple.” Scope to the individual resource unless the workload needs the group.

# Least privilege: data-plane read on ONE Key Vault, not the resource group
KV_ID=$(az keyvault show -n kv-payments -g rg-payments --query id -o tsv)

az role assignment create \
  --assignee-object-id "$PRINCIPAL_ID" \
  --assignee-principal-type ServicePrincipal \
  --role "Key Vault Secrets User" \
  --scope "$KV_ID"

--assignee-principal-type ServicePrincipal is not cosmetic: without it, az resolves the principal by querying Graph, which fails for a just-created identity that hasn’t replicated yet. Always pass the object ID plus the explicit type.

Pick the purpose-built data-plane role, never a management role.

Workload needs Correct role Avoid
Read Key Vault secrets Key Vault Secrets User Contributor, Key Vault Administrator
Read/write blobs Storage Blob Data Contributor Contributor (gives keys, not data-plane intent)
Read-only blobs Storage Blob Data Reader Storage Account Contributor
Query Azure SQL (none — use a contained DB user, see §6) server-level Azure RBAC

Contributor on a storage account lets the identity list and regenerate account keys — a full bypass of the data-plane RBAC model. Reach for *Data* roles every time; they grant blob/queue/table access without exposing account keys.

4. Federated identity credentials: trust external issuers without secrets

This is the capability that changes the architecture. A federated identity credential (FIC) lets a user-assigned identity trust tokens from an external OIDC issuer — GitHub Actions, a GKE or self-hosted Kubernetes cluster, GitLab, any compliant IdP. The workload presents its own OIDC token, Entra validates the issuer/subject against the FIC and returns an Entra token for your identity. No Azure secret ever leaves Azure. FICs work on user-assigned identities and app registrations, not system-assigned.

GitHub Actions

az identity federated-credential create \
  --name github-deploy-main \
  --identity-name id-payments-prod \
  --resource-group rg-identity-prod \
  --issuer "https://token.actions.githubusercontent.com" \
  --subject "repo:my-org/payments-service:ref:refs/heads/main" \
  --audiences "api://AzureADTokenExchange"

The subject is the exact claim GitHub stamps into its OIDC token: branch refs use repo:ORG/REPO:ref:refs/heads/BRANCH, environments repo:ORG/REPO:environment:NAME, pull requests repo:ORG/REPO:pull_request. Get this string wrong and you get AADSTS70021: No matching federated identity record found — the most common FIC failure. Match it character-for-character.

# The matching GitHub Actions job — note: no client secret anywhere
permissions:
  id-token: write   # REQUIRED for the OIDC token; without it azure/login fails
  contents: read
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: azure/login@v2
        with:
          client-id: ${{ vars.AZURE_CLIENT_ID }}     # the identity's clientId
          tenant-id: ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}

External Kubernetes / GKE

Any Kubernetes cluster that exposes its OIDC issuer can federate — point the FIC at the issuer URL and ServiceAccount subject:

# Trust a specific ServiceAccount on an external (e.g. GKE) cluster
az identity federated-credential create \
  --name gke-batch-runner \
  --identity-name id-payments-prod \
  --resource-group rg-identity-prod \
  --issuer "https://container.googleapis.com/v1/projects/PROJECT/locations/LOCATION/clusters/CLUSTER" \
  --subject "system:serviceaccount:payments:batch-runner-sa" \
  --audiences "api://AzureADTokenExchange"

The audience for Entra token exchange is always api://AzureADTokenExchange. There is a hard limit of 20 federated credentials per identity, so model subjects deliberately — a branch and an environment, not every ephemeral PR.

5. Using user-assigned identities across AKS, Functions, VMs, and App Service

The attach mechanism differs per service; the identity is the same. VMs and VMSS attach by resource ID:

az vm identity assign \
  --name vm-batch-01 -g rg-payments \
  --identities "$RESOURCE_ID"

App Service and Functions — assign, then tell the runtime which client ID is the default so the SDK doesn’t guess:

az functionapp identity assign \
  --name func-payments -g rg-payments \
  --identities "$RESOURCE_ID"

# Make this identity the default for the runtime's token requests
az functionapp config appsettings set \
  --name func-payments -g rg-payments \
  --settings "AZURE_CLIENT_ID=${CLIENT_ID}"

AKS — Workload Identity is the modern path (aad-pod-identity is deprecated). Enable the OIDC issuer and workload-identity add-on, then federate a ServiceAccount to the identity:

# Enable on the cluster
az aks update -n aks-prod -g rg-aks \
  --enable-oidc-issuer --enable-workload-identity

# Federate the in-cluster ServiceAccount to the identity
ISSUER=$(az aks show -n aks-prod -g rg-aks --query oidcIssuerProfile.issuerUrl -o tsv)
az identity federated-credential create \
  --name aks-payments-sa \
  --identity-name id-payments-prod -g rg-identity-prod \
  --issuer "$ISSUER" \
  --subject "system:serviceaccount:payments:payments-sa" \
  --audiences "api://AzureADTokenExchange"
# The ServiceAccount and Pod that consume it
apiVersion: v1
kind: ServiceAccount
metadata:
  name: payments-sa
  namespace: payments
  annotations:
    azure.workload.identity/client-id: "<CLIENT_ID>"
---
apiVersion: apps/v1
kind: Deployment
metadata: { name: payments, namespace: payments }
spec:
  template:
    metadata:
      labels:
        azure.workload.identity/use: "true"   # opt the pod into the webhook
    spec:
      serviceAccountName: payments-sa
      containers:
        - name: app
          image: myregistry.azurecr.io/payments:1.4.2

AKS Workload Identity is itself a federated credential — the §4 primitive with the cluster as issuer. One mental model covers all three.

6. Accessing Key Vault, Storage, and SQL with managed identity

Key Vault needs the data-plane role (§3) and the vault set to Azure RBAC authorization. A vault still on the legacy access-policy model ignores your role assignment entirely — a silent failure:

az keyvault update -n kv-payments -g rg-payments \
  --enable-rbac-authorization true

Storage uses the OAuth path automatically once the identity holds a *Data* role — the SDK requests a https://storage.azure.com token, no connection string or account key.

Azure SQL trips people up: there is no Azure RBAC role for the SQL data plane. Create a contained database user mapped to the identity and grant roles in T-SQL — connect as an Entra admin and run:

-- Map the user-assigned identity into the database by its display name
CREATE USER [id-payments-prod] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [id-payments-prod];
ALTER ROLE db_datawriter ADD MEMBER [id-payments-prod];

The app connects with Authentication=Active Directory Managed Identity;User Id=<CLIENT_ID>; — the driver pulls the token from IMDS.

Verify

Validate the chain end to end before shipping:

# 1. The identity exists and you have its IDs
az identity show -n id-payments-prod -g rg-identity-prod \
  --query "{client:clientId, principal:principalId}" -o table

# 2. Role assignments are present AND at the scope you intended
az role assignment list --assignee "$PRINCIPAL_ID" \
  --query "[].{role:roleDefinitionName, scope:scope}" -o table

# 3. Federated credential subject matches the issuer exactly
az identity federated-credential list \
  --identity-name id-payments-prod -g rg-identity-prod \
  --query "[].{name:name, issuer:issuer, subject:subject}" -o table

# 4. From inside the workload, confirm a token actually issues
curl -s -H "Metadata: true" \
  "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://vault.azure.net&client_id=${CLIENT_ID}" \
  | python -c "import sys,json; print('aud OK, expires_on', json.load(sys.stdin)['expires_on'])"

If step 4 returns a token but the data call 403s, the cause is RBAC scope, vault RBAC mode, or a stale cached token.

Troubleshooting

Enterprise scenario

A platform team running a multi-tenant SaaS on AKS had ~140 microservices, each with its own system-assigned identity and Key Vault role assignments in Terraform. Their constraint surfaced during a blue/green cluster migration: standing up the new node pools recreated several workloads and regenerated their principalIds. Terraform showed every azurerm_role_assignment as present, but runtime tokens 403’d against Key Vault because the assignments pointed at principal IDs that no longer existed.

The fix was to make identity a first-class, decoupled resource. They moved every workload to a per-service user-assigned identity in a dedicated resource group, federated to the AKS OIDC issuer via Workload Identity, with role assignments keyed off the stable principal_id. Cluster rebuilds no longer touched identities or grants.

resource "azurerm_user_assigned_identity" "svc" {
  name                = "id-${var.service_name}-${var.env}"
  resource_group_name = azurerm_resource_group.identity.name
  location            = var.location
}

resource "azurerm_federated_identity_credential" "aks" {
  name                = "aks-${var.service_name}"
  resource_group_name = azurerm_resource_group.identity.name
  parent_id           = azurerm_user_assigned_identity.svc.id
  audience            = ["api://AzureADTokenExchange"]
  issuer              = var.aks_oidc_issuer_url
  subject             = "system:serviceaccount:${var.namespace}:${var.service_name}-sa"
}

# Stable principal_id — survives cluster and pod recreation
resource "azurerm_role_assignment" "kv" {
  scope                = azurerm_key_vault.svc.id
  role_definition_name = "Key Vault Secrets User"
  principal_id         = azurerm_user_assigned_identity.svc.principal_id
}

One identity per service kept each FIC budget local and made a compromised service’s blast radius exactly its own roles — nothing more.

Governance: tagging, inventory, and finding over-privileged identities

User-assigned identities become orphans if nobody owns the lifecycle. Tag them with owner and workload at creation; audit via Resource Graph.

// Inventory of all user-assigned identities, with owner tag
Resources
| where type =~ "microsoft.managedidentity/userassignedidentities"
| project name, resourceGroup,
          owner = tostring(tags['owner']),
          workload = tostring(tags['workload']),
          principalId = tostring(properties.principalId)
| order by resourceGroup asc

For over-privilege, the signal is role assignments at subscription or resource-group scope rather than a single resource — the grants worth challenging in review:

# Flag assignments scoped above the individual resource
az role assignment list --all --assignee "$PRINCIPAL_ID" \
  --query "[?contains(scope, '/providers/') == \`false\`].{role:roleDefinitionName, scope:scope}" \
  -o table

Enforce a tagging policy so untagged identities can’t be created (Azure Policy Require a tag on resources), and run the inventory query on a schedule so orphaned and over-scoped identities surface before an auditor finds them.

Checklist

Entra IDManaged IdentityAzure RBACFederated Identity CredentialsWorkload Identity

Comments

Keep Reading