Containerization Security

Migrating to Pod Security Admission: Enforcing Baseline and Restricted Profiles Without Breaking Workloads

PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. If your hardening still depends on it, the upgrade that drops it is a cliff, not a ramp. The built-in replacement, Pod Security Admission (PSA), is deliberately simpler: no policy objects, no RBAC to bind, no ordering ambiguity. It trades flexibility for predictability, and that trade is the whole reason a migration can be done without an outage. This is the playbook I run: inventory first, label deliberately, fix the handful of Restricted blockers that account for almost every failure, and flip enforce only when audit has been quiet for a release cycle.

1. The model: three modes, three levels

PSA is a built-in admission controller, enabled by default since 1.25. It evaluates Pods against the Pod Security Standards and applies a verdict per namespace, configured purely through labels. There is nothing to install.

Two axes define behavior. The mode decides what happens on a violation; the level decides how strict the bar is.

Mode On violation Blocks creation? Use for
enforce Pod is rejected Yes The actual gate
audit Allowed; annotation written to the audit log No Inventory without disruption
warn Allowed; warning returned to the client No Feedback to whoever applied it
Level Intent Typical fit
privileged Unrestricted, no constraints System / infra namespaces only
baseline Blocks known privilege escalations; minimally restrictive Most application workloads
restricted Hardened, current best practice New workloads, regulated estates

The three modes are independent and can each point at a different level. That is the single most important property for a safe rollout: you can set enforce to baseline while pointing audit and warn at restricted, so the cluster is protected at one bar while you measure the cost of the stricter one.

PSA controls only the Pod security context fields covered by the Pod Security Standards. It does not do image provenance, network policy, resource quotas, or anything custom. If your requirement is not one of runAsNonRoot, capabilities, host namespaces, volume types, seccompProfile, and the like, PSA is the wrong tool and you want Kyverno or a validating webhook (step 6).

A namespace with no PSA labels inherits the cluster defaults, which out of the box are privileged for every mode. That means doing nothing leaves you wide open — an empty cluster is not secure by default, it is permissive by default.

2. Inventory violations cluster-wide before enforcing anything

Never lead with enforce. Lead with measurement. The fastest read on what a target level would cost is the dry-run check on an existing namespace, which evaluates every running Pod against a level without changing any configuration:

# What would `restricted` reject in this namespace, right now?
kubectl label --dry-run=server --overwrite ns team-payments \
  pod-security.kubernetes.io/enforce=restricted

The server runs the admission check against all current Pods and prints every workload that would be denied, with the exact field at fault. Nothing is persisted. Loop it across the cluster to build the estate-wide picture:

for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
  echo "== $ns =="
  kubectl label --dry-run=server --overwrite ns "$ns" \
    pod-security.kubernetes.io/enforce=restricted 2>&1 | grep -E 'warn|violate' || echo "clean"
done

For a durable, queryable inventory rather than a one-shot scan, set warn and audit cluster-wide via the AdmissionConfiguration file passed to the API server. This evaluates every new and updated Pod without blocking anything:

# admission-config.yaml — referenced by --admission-control-config-file
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
  - name: PodSecurity
    configuration:
      apiVersion: pod-security.admission.config.k8s.io/v1
      kind: PodSecurityConfiguration
      defaults:
        enforce: "privileged"      # do NOT enforce yet
        enforce-version: "latest"
        audit: "restricted"        # measure the strict bar everywhere
        audit-version: "latest"
        warn: "restricted"
        warn-version: "latest"
      exemptions:
        namespaces:
          - kube-system            # never evaluate control-plane add-ons
          - kube-node-lease

On a managed control plane (EKS, AKS, GKE) you cannot pass API-server flags. There, drive the same outcome with namespace labels at the warn/audit level, or — cleaner at scale — a Kyverno policy in Audit mode that mirrors the standards. Either way the rule holds: collect the full violation set before a single namespace moves to enforce.

Audit verdicts land in the API-server audit log as annotations. If you ship audit logs to a SIEM, that is your dashboard source:

// Azure Monitor / Log Analytics example: PSA audit violations by namespace
AzureDiagnostics
| where Category == "kube-audit"
| extend ann = parse_json(log_s)
| where tostring(ann.annotations["pod-security.kubernetes.io/audit-violations"]) != ""
| summarize violations = count() by namespace = tostring(ann.objectRef.namespace)
| order by violations desc

3. Namespace labeling strategy and exemptions

PSA configuration is three labels per namespace, optionally paired with a version pin (step 7):

apiVersion: v1
kind: Namespace
metadata:
  name: team-payments
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/enforce-version: v1.31
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

This is the asymmetry that makes the rollout safe: enforce is held at baseline (the bar you have already cleared), while audit and warn run restricted ahead of it. The namespace is genuinely protected, and you have a live readout of the remaining work to reach restricted — without rejecting anything.

The enforce label is evaluated only at Pod create / update admission. Adding it does not evict Pods that already violate it — they keep running until their next rollout. That is a feature for migration (no surprise outage) and a trap for assurance (an unenforced violation can linger for weeks). Roll the affected Deployments deliberately once you flip enforce, and treat “labeled” and “compliant” as different states.

Exemptions are the escape hatch for components that legitimately cannot satisfy any standard — CNI agents, CSI drivers, node-problem-detector, monitoring DaemonSets that read host paths. There are two mechanisms with very different blast radius:

Never enforce restricted or baseline on kube-system. Core add-ons run privileged by design, and rejecting them will brick the control plane. Exempt it explicitly, then put the privileged components you own in dedicated namespaces (step 5) rather than dumping them into kube-system.

4. Fix the common Restricted blockers

Roughly five fields produce the overwhelming majority of restricted rejections. Knowing them turns “audit the whole estate” into a short, mechanical fixup. Restricted requires all of the following at the Pod and container level:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true              # must not run as UID 0
        seccompProfile:
          type: RuntimeDefault          # required by restricted
      containers:
        - name: api
          image: ghcr.io/acme/api@sha256:...
          securityContext:
            allowPrivilegeEscalation: false   # required
            runAsNonRoot: true
            capabilities:
              drop: ["ALL"]                   # drop everything
            readOnlyRootFilesystem: true      # baseline-recommended, restricted-friendly

The recurring failures and their fixes:

Blocker Why it fails Fix
runAsNonRoot Image defaults to root; no runAsUser/runAsNonRoot set Set runAsNonRoot: true; ensure the image has a numeric non-root user
seccompProfile Unset — restricted requires it explicitly seccompProfile.type: RuntimeDefault at Pod or container level
capabilities Not dropped, or adds beyond NET_BIND_SERVICE drop: ["ALL"]; the only add restricted permits is NET_BIND_SERVICE
allowPrivilegeEscalation Defaults to true Set false explicitly on every container
Running as root for ports < 1024 App binds 80/443 directly Bind a high port + Service remap, or add: ["NET_BIND_SERVICE"]

The runAsNonRoot failure catches teams off guard because runAsNonRoot: true is an assertion, not a coercion. It does not change the UID — it tells the kubelet to refuse a container whose image would run as 0. If the image has no non-root user baked in, PSA admits the Pod and the kubelet then fails it with container has runAsNonRoot and image will run as root. The real fix lives in the Dockerfile:

# Give the image a non-root user so runAsNonRoot is satisfiable
RUN addgroup -S app && adduser -S -G app -u 10001 app
USER 10001:10001

One more frequent trap: restricted forbids hostPath and several volume types outright. A sidecar mounting hostPath for logs or metrics will fail no securityContext tweak — it needs a different volume (emptyDir, projected, CSI) or the namespace stays at baseline.

5. Workloads that genuinely need Privileged

Some workloads cannot be hardened: ebpf agents, GPU device plugins, storage drivers, anything touching host namespaces or devices. The mistake is granting privileged broadly to accommodate them. Quarantine them instead.

Create dedicated, clearly named namespaces, set them to privileged, and compensate with controls outside PSA’s scope — because PSA cannot express “privileged but only for this ServiceAccount”:

apiVersion: v1
kind: Namespace
metadata:
  name: infra-privileged
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged
    purpose: privileged-system-components

Wrap the namespace in defense in depth:

The discipline is: privileged is a property of a small, named set of namespaces you can list on one screen — not a default that leaks into application space.

6. Layer Kyverno where PSA is too coarse

PSA is intentionally blunt: a namespace is privileged, baseline, or restricted, full stop. Real estates need finer rules — “restricted everywhere, except this one DaemonSet may add SYS_PTRACE,” or “allow hostPath, but only under /var/log.” That granularity is exactly what PSA omits, and where a policy engine earns its place.

Kyverno can apply the Pod Security profiles itself, with per-control exclusions PSA cannot express:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: psa-restricted-with-exception
spec:
  validationFailureAction: Audit     # start in Audit, promote to Enforce later
  background: true
  rules:
    - name: restricted-baseline
      match:
        any:
          - resources:
              kinds: ["Pod"]
      validate:
        podSecurity:
          level: restricted
          version: latest
          exclude:
            # allow ONLY this control, ONLY for the matched images
            - controlName: "Capabilities"
              images: ["ghcr.io/acme/network-agent*"]

The decision rule I use: PSA is the floor, Kyverno is the scalpel. Keep PSA enabled and enforcing baseline/restricted at the namespace level so there is always a backstop that survives a Kyverno outage. Reach for Kyverno only when you need an exception narrower than a whole namespace, or a control PSA does not cover (image signatures, required labels, registry allow-lists). Running both is not redundant — PSA is the dependency-free guarantee; Kyverno is the expressive layer on top.

7. Phased rollout: warn to audit to enforce, with version pinning

The version pin is the underrated control. The Pod Security Standards tighten across Kubernetes releasesrestricted in 1.31 forbids things 1.27 allowed. If you pin enforce to latest, a cluster upgrade can silently start rejecting Pods that were compliant yesterday. Pin every enforce to an explicit version, and bump it as a deliberate, reviewed change:

metadata:
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.31   # pinned, not "latest"
    pod-security.kubernetes.io/warn: restricted         # warn can ride latest
    pod-security.kubernetes.io/warn-version: latest

The per-namespace progression I run:

  1. Warn + audit at the target level (enforce still privileged or baseline). No rejections; collect violations from logs and client warnings for a full sprint.
  2. Fix the workloads using step 4. Re-run dry-run until the namespace is clean against the target.
  3. Enforce at the target, pinned to the current version. Now roll the Deployments so existing Pods are re-admitted under the new bar — labeling alone does not evict them.
  4. Hold for one release cycle, watching audit annotations for regressions from new deploys, then advance the next namespace.

Stage the levels too: most application namespaces should reach enforce: baseline first (cheap, high value), then graduate to restricted as the Dockerfile and securityContext work lands. Do not try to land restricted everywhere in one change.

Enterprise scenario

A payments platform team ran a 1.24 EKS fleet hardened entirely with PodSecurityPolicies — dozens of PSPs plus the RBAC ClusterRole/RoleBinding web PSP required to take effect. The 1.25 upgrade removed PSP. Because enforcement depended on those bindings, the upgrade did not error loudly; it just silently stopped enforcing. For three days every namespace was effectively privileged and nobody noticed, until a routine CIS benchmark scan flagged the regression.

The constraint: ~140 namespaces, a hard PCI-DSS requirement that workloads not run as root, and zero tolerance for blocking payment Deployments during business hours. A flat enforce: restricted would have rejected legacy services still running as UID 0 and taken down a node-local fraud-scoring DaemonSet mounting a hostPath socket.

What they did, in order:

  1. Set cluster-wide audit: restricted and warn: restricted via the AdmissionConfiguration (self-managed control plane), enforce left at privileged. The audit annotations went to the existing Splunk pipeline, producing a ranked list: 31 namespaces clean, 12 needing runAsNonRoot/seccompProfile fixes, 1 genuinely needing privileged.
  2. Moved the fraud-scoring DaemonSet into a dedicated infra-privileged namespace pinned to privileged, fenced with RBAC, a default-deny NetworkPolicy, and a Kyverno rule blocking hostNetwork.
  3. Promoted the 31 clean namespaces straight to enforce: restricted, version-pinned to v1.25, rolling each Deployment off-hours.
  4. Fixed the 12 laggards over two sprints — mostly a USER 10001 line in the Dockerfile and adding seccompProfile: RuntimeDefault — then enforced them.

The version pin paid off six months later: the 1.28 upgrade introduced no surprise rejections, because enforce-version was held at the level the workloads were validated against. The graduation to the 1.28 restricted profile was scheduled as its own change with its own dry-run pass.

# the load-bearing config: strict where measured, privileged only where named
defaults:
  enforce: "restricted"
  enforce-version: "v1.25"   # pinned to the validated standard
  audit: "restricted"
  warn: "restricted"
exemptions:
  namespaces: ["kube-system", "kube-node-lease", "infra-privileged"]

Verify

Confirm the enforcement is real, not just labeled.

# 1. Inspect the live PSA labels on a namespace
kubectl get ns team-payments -o jsonpath='{.metadata.labels}' | jq

# 2. Prove enforce actually rejects — this Pod violates restricted and must fail
kubectl run psa-probe --image=nginx -n team-payments
# expected: Error ... violates PodSecurity "restricted:v1.31": allowPrivilegeEscalation != false,
#           unrestricted capabilities, runAsNonRoot != true, seccompProfile ...

# 3. Confirm existing Pods were actually re-admitted (not lingering pre-enforce)
kubectl get pods -n team-payments -o json \
  | jq '.items[].metadata.annotations["pod-security.kubernetes.io/enforce-policy"]'

# 4. Server-side dry-run: does anything STILL violate the target level?
kubectl label --dry-run=server --overwrite ns team-payments \
  pod-security.kubernetes.io/enforce=restricted

Catch regressions before they reach the cluster by running the standards in CI with conftest, so a non-compliant manifest fails the PR rather than the namespace:

# Evaluate rendered manifests against an OPA/Rego PSA policy in the pipeline
helm template ./chart | conftest test --policy ./policy/pod-security.rego -

A green CI gate plus a quiet audit annotation stream for a full release cycle is the signal that a namespace is genuinely converged — not the presence of the label.

Checklist

Pitfalls

Next steps: wire the conftest PSA gate into the same CI stage as your image-signature verification so admission and pipeline policy never drift, and schedule the standards version bump (e.g., to the 1.31 restricted profile) as a recurring, dry-run-gated change rather than letting a cluster upgrade decide it for you.

kubernetessecuritypod-security-admissionadmission-controlcompliance

Comments

Keep Reading