DevOps Multi-Cloud

Dynamic Secrets in CI/CD with HashiCorp Vault: Short-Lived Cloud and Database Credentials

Most pipeline credential leaks are not exotic. They are a static AWS access key pasted into a repo secret two years ago, copied into a fork, and never rotated. The fix is not “rotate harder” — it is to stop storing the credential at all. With HashiCorp Vault dynamic secrets, the credential does not exist until a build asks for it, it is scoped to that build, and it self-destructs when the lease expires. The pipeline authenticates with its own native identity token, so there is no bootstrap secret to leak either.

This guide wires a real CI estate to Vault: JWT/OIDC auth from GitHub Actions and GitLab, least-privilege role binding with bound claims, then dynamic database, cloud (AWS/Azure/GCP), and PKI credentials, finishing with response wrapping, audit, and an emergency-revoke runbook. Everything assumes Vault 1.15+ and CLI vault.

1. The secret-engine model: leases, TTL hierarchy, and revocation

Every dynamic credential Vault issues is wrapped in a lease. A lease has an ID, a TTL, and a max TTL. When you read from a dynamic engine, Vault creates the backend object (an IAM user, a database role, a signed cert), records the lease, and hands you both the credential and the lease_id.

Three lifecycle operations matter:

Operation Command Effect
Renew vault lease renew <lease_id> Extends TTL, capped by max TTL
Revoke vault lease revoke <lease_id> Deletes the backend object now
Revoke prefix vault lease revoke -prefix <mount>/ Kills every lease under a mount (break-glass)

TTL resolves through a hierarchy, and the shortest wins: system max (max_lease_ttl in the mount tune) caps the engine, the engine/role config caps the credential, and an explicit request TTL can only go shorter. For CI you want aggressive defaults — a build rarely needs more than its own runtime:

# Tune a mount so nothing under it can outlive the longest pipeline
vault secrets tune -default-lease-ttl=15m -max-lease-ttl=1h database/

The key property: if a build dies, its token’s leases are revoked when the token expires, even if the pipeline never called revoke. Orphaned credentials are the exception, not the default.

2. Authenticating CI to Vault with JWT/OIDC

CI runners already hold a signed identity token. GitHub Actions mints one per job (issuer https://token.actions.githubusercontent.com); GitLab injects CI_JOB_JWT_V2 / an ID token with issuer your GitLab base URL. Vault’s jwt auth method validates that token against the provider’s JWKS — no secret stored in the runner.

Enable and configure the method once per provider. For GitHub Actions:

vault auth enable -path=github-actions jwt

vault write auth/github-actions/config \
  oidc_discovery_url="https://token.actions.githubusercontent.com" \
  bound_issuer="https://token.actions.githubusercontent.com" \
  default_role="ci-build"

For GitLab self-managed, point discovery at your instance:

vault auth enable -path=gitlab jwt

vault write auth/gitlab/config \
  oidc_discovery_url="https://gitlab.example.com" \
  bound_issuer="https://gitlab.example.com"

Vault fetches the JWKS from the discovery URL and caches it, so key rotation on the provider side is automatic. Use jwt, not oidc — the oidc flow is an interactive browser redirect for humans; jwt is the non-interactive machine path.

3. Binding roles to policies with bound claims (least privilege)

A role decides which tokens may log in and what they get. Least privilege lives in bound_claims: the role only issues a Vault token if the incoming JWT’s claims match exactly. Lock to a specific repo, ref, and environment — never just the org.

vault write auth/github-actions/role/deploy-prod \
  role_type="jwt" \
  user_claim="sub" \
  bound_claims_type="glob" \
  bound_claims='{"repository":"my-org/payments-svc","environment":"prod"}' \
  bound_audiences="https://vault.example.com" \
  token_policies="db-prod-read,aws-deploy" \
  token_ttl=20m \
  token_max_ttl=30m

Two details that catch teams out:

The policies the token carries are ordinary Vault policy. Keep them narrow:

# db-prod-read.hcl — only the one role, only read
path "database/creds/payments-ro" {
  capabilities = ["read"]
}

In the pipeline, the login exchanges the JWT for a Vault token:

# .github/workflows/deploy.yml
permissions:
  id-token: write   # required to mint the OIDC token
  contents: read
jobs:
  deploy:
    environment: prod
    runs-on: ubuntu-latest
    steps:
      - uses: hashicorp/vault-action@v3
        with:
          url: https://vault.example.com
          method: jwt
          path: github-actions
          role: deploy-prod
          # Pull a dynamic DB credential in the same step
          secrets: |
            database/creds/payments-ro username | DB_USER ;
            database/creds/payments-ro password | DB_PASS

vault-action injects the values as masked env vars and revokes the lease when the job ends unless you set exportToken. That last part is what makes the credential short-lived in practice, not just in theory.

4. Dynamic database credentials

The database secrets engine generates a real DB user on demand, runs your creation SQL, hands the build a unique username/password, and drops the user when the lease expires. The build never sees a shared service account.

vault secrets enable database

# Connection: Vault uses an admin/rotation account, never shared with CI
vault write database/config/payments \
  plugin_name="postgresql-database-plugin" \
  allowed_roles="payments-ro,payments-migrate" \
  connection_url="postgresql://{{username}}:{{password}}@pg.internal:5432/payments?sslmode=require" \
  username="vault_admin" \
  password="$PG_ADMIN_PW"

# Immediately rotate the admin password so even you no longer know it
vault write -force database/rotate-root/payments

Run rotate-root right after configuring. After it, the admin password lives only inside Vault. Storing the literal admin password anywhere defeats the purpose.

Define a role with the creation statements and tight TTLs:

vault write database/roles/payments-ro \
  db_name="payments" \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; \
                       GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
  default_ttl="15m" \
  max_ttl="30m"

vault read database/creds/payments-ro
# Key      Value
# lease_id database/creds/payments-ro/8x...
# username v-github-payments-ro-Hh3...
# password A1a-...

Each read yields a fresh, uniquely named role — so an audit log entry maps one credential to one build. No more “which of the 40 builds used the shared app_rw user?”

5. Short-lived cloud credentials: AWS, Azure, GCP

AWS

The AWS engine can issue STS-federated or assumed-role credentials. Prefer assumed_role (or federation_token) so you get true short-lived STS tokens rather than IAM users that need cleanup:

vault secrets enable aws
vault write aws/config/root \
  access_key="$VAULT_AWS_AK" \
  secret_key="$VAULT_AWS_SK" \
  region="eu-west-1"

vault write aws/roles/deploy \
  credential_type="assumed_role" \
  role_arns="arn:aws:iam::111122223333:role/ci-deploy" \
  default_sts_ttl="15m" max_sts_ttl="1h"

vault read aws/creds/deploy -ttl=15m

The ci-deploy IAM role’s trust policy must allow Vault’s own principal to assume it. STS caps assumed-role sessions at 1h here; design pipelines around that, not against it.

Azure

The Azure engine creates a service principal (or assigns a role to an existing one) scoped to a subscription/resource group:

vault secrets enable azure
vault write azure/config \
  subscription_id="$ARM_SUBSCRIPTION_ID" \
  tenant_id="$ARM_TENANT_ID" \
  client_id="$ARM_CLIENT_ID" \
  client_secret="$ARM_CLIENT_SECRET"

vault write azure/roles/deploy ttl=20m max_ttl=1h azure_roles=-<<EOF
[
  {
    "role_name": "Contributor",
    "scope": "/subscriptions/$ARM_SUBSCRIPTION_ID/resourceGroups/rg-payments"
  }
]
EOF

vault read azure/creds/deploy

Azure AD propagation lag is real: a freshly minted SP may not be usable for a few seconds. Vault retries internally, but if your first az call 403s, a short backoff-and-retry in the pipeline is the correct fix, not a longer TTL.

GCP

The GCP engine mints OAuth2 access tokens (recommended — nothing to clean up) or short-lived service-account keys:

vault secrets enable gcp
vault write gcp/config credentials=@vault-gcp-sa.json

vault write gcp/roleset/deploy \
  project="my-project" \
  secret_type="access_token" \
  token_scopes="https://www.googleapis.com/auth/cloud-platform" \
  bindings=-<<EOF
resource "//cloudresourcemanager.googleapis.com/projects/my-project" {
  roles = ["roles/storage.admin"]
}
EOF

vault read gcp/token/deploy

access_token rolesets return a ~1h OAuth token bound to a Vault-managed SA — no key material on disk, nothing to revoke later.

6. On-demand PKI for mTLS between stages

When your build stage hands off to a deploy stage (or talks to an internal artifact service), mint a short-lived client cert per run instead of shipping a long-lived key.

vault secrets enable -path=pki_int pki
vault secrets tune -max-lease-ttl=72h pki_int

# Assume an intermediate CA is already configured and signed by your root.
vault write pki_int/roles/ci-client \
  allowed_domains="ci.internal" \
  allow_subdomains=true \
  client_flag=true server_flag=false \
  key_type="ec" key_bits=256 \
  max_ttl="30m"

vault write pki_int/issue/ci-client \
  common_name="build-${GITHUB_RUN_ID}.ci.internal" ttl="20m"

The response includes certificate, private_key, and issuing_ca. The deploy service trusts your intermediate CA and validates the client cert; because each run’s cert lives 20 minutes, a leaked artifact is worthless almost immediately. Set client_flag=true/server_flag=false so the cert cannot be repurposed as a server identity.

7. Response wrapping, the agent sidecar, and secret-zero

Secret-zero is the bootstrap credential the runner needs to reach Vault. With JWT/OIDC you have largely eliminated it — the OIDC token is the identity, signed by the provider, valid for minutes. But two patterns harden the remaining edges.

Response wrapping lets one process fetch a single-use, TTL-limited token that wraps a secret; only the intended consumer can unwrap it, and unwrap is one-shot. Useful when an orchestrator hands a secret to a child job:

# Orchestrator wraps a token; child unwraps exactly once
WRAP=$(vault token create -wrap-ttl=90s -policy=db-prod-read -field=wrapping_token)
# ... pass $WRAP to the child ...
vault unwrap "$WRAP"     # second attempt fails; tampering is detectable

Vault Agent as a sidecar handles auto-auth and templating so application code never touches Vault directly:

# agent.hcl
auto_auth {
  method "jwt" {
    mount_path = "auth/github-actions"
    config = {
      role = "deploy-prod"
      path = "/var/run/secrets/oidc-token"   # CI writes its JWT here
    }
  }
  sink "file" { config = { path = "/run/vault-token" } }
}

template {
  contents    = "DB_DSN=postgres://{{ with secret \"database/creds/payments-ro\" }}{{ .Data.username }}:{{ .Data.password }}{{ end }}@pg.internal/payments"
  destination  = "/run/secrets/db.env"
}

The agent renews leases and rewrites the template before expiry, so a long-running job never holds a stale credential. Critically, the OIDC token file is the only thing on disk, and it is itself short-lived.

8. Audit devices, lease monitoring, and emergency revoke

Turn on an audit device before any of this carries real traffic — every request and response (with secrets HMAC’d) is logged:

vault audit enable file file_path=/var/log/vault/audit.log

Watch active leases and active credentials in flight:

vault list sys/leases/lookup/database/creds/payments-ro   # active DB creds
vault list auth/github-actions/role                        # configured roles

Emergency revoke runbook — a credential leaked from a build log:

  1. Identify the mount/prefix from the audit log (request.path).
  2. Revoke everything under it immediately:
    vault lease revoke -prefix -sync aws/creds/deploy
    
  3. If a Vault token itself leaked, revoke it and its children:
    vault token revoke <accessor-or-id>
    
  4. Rotate the engine’s backing root if the platform account is suspect:
    vault write -force database/rotate-root/payments
    vault write -force aws/config/rotate-root
    
  5. Confirm zero active leases remain under the prefix via sys/leases/lookup.

Because every credential was dynamic and lease-bound, revoke is a single prefix call — not an audit of 200 services hunting for a hard-coded key.

Verify

Run these end to end before declaring the pipeline migrated:

# 1. CI identity actually authenticates (run from a real job, not your laptop)
vault write auth/github-actions/login role=deploy-prod jwt="$ACTIONS_ID_TOKEN"

# 2. A dynamic DB cred is issued and is unique per read
vault read -field=username database/creds/payments-ro
vault read -field=username database/creds/payments-ro   # different value

# 3. The lease really expires: read with a short TTL, wait, confirm login fails
vault read database/creds/payments-ro -ttl=1m

# 4. Cloud creds are short-lived STS, not static keys
vault read -format=json aws/creds/deploy | jq '.lease_duration'

# 5. Revoke works
vault lease revoke -prefix -sync database/creds/payments-ro
vault list sys/leases/lookup/database/creds/payments-ro   # empty

If step 2 returns the same username twice, you are reading a static secret, not a dynamic one. If step 5 leaves leases, your revocation statements are failing — check the audit log.

Enterprise scenario

A payments platform team ran ~180 microservice pipelines in GitHub Actions, each holding a static app_rw Postgres password and a long-lived AWS IAM user key in repo secrets. An auditor flagged that a fork PR had once exfiltrated the DB password via printenv in a build log; rotating it meant a coordinated redeploy of every service, so it had not been rotated in 14 months.

The constraint: they could not pause deploys for a migration window, and the database team refused to grant Vault a permanent superuser. The solution was a phased cutover. First they configured the database engine with a dedicated vault_admin role that held only CREATEROLE and GRANT on the app schemas — not superuser — and ran rotate-root so even the DBA no longer knew the password:

vault write database/config/payments \
  plugin_name="postgresql-database-plugin" \
  allowed_roles="payments-ro,payments-migrate" \
  connection_url="postgresql://{{username}}:{{password}}@pg.internal:5432/payments?sslmode=require" \
  username="vault_admin" password="$BOOTSTRAP_PW" \
  password_policy="payments-strong"
vault write -force database/rotate-root/payments

Then they shipped both old and new credentials in parallel for two weeks: the pipeline preferred the Vault-issued dynamic user but fell back to the static one if the Vault step failed, with a metric on which path each build took. Once the fallback rate hit zero, they deleted the repo secret and the static app_rw role in the same change. AWS followed the same pattern via assumed_role against a per-team ci-deploy role. Net result: zero static DB or cloud credentials, every build’s database access traceable to a uniquely named role in the audit log, and emergency revoke reduced from a multi-day redeploy to a single lease revoke -prefix.

Checklist

vaultsecrets-managementci-cddynamic-secretsdevsecops

Comments

Keep Reading