Most teams stop learning CSI the moment a PVC binds. That is a mistake. The interesting half of the Container Storage Interface — snapshots, cloning, online expansion, and topology-aware placement — is exactly the half you reach for during an incident, a migration, or a 2 a.m. restore. This guide walks the full feature set with real manifests, the sidecars that make each feature work, and the failure modes that actually page you. Examples assume a cluster on Kubernetes 1.27+ with a modern CSI driver (the EBS, Disk CSI, GCE PD, and Ceph drivers all behave the same way here).
1. The CSI architecture you actually need to understand
A CSI driver is not one process. It is a node-level plugin (a DaemonSet that mounts volumes onto the host) plus a controller deployment that wraps the vendor’s CSI gRPC server with a set of Kubernetes-aware sidecar containers. Each sidecar watches one kind of object and translates it into a CSI call. You should know which sidecar owns which feature, because when a feature silently does nothing, the answer is almost always “that sidecar isn’t deployed or isn’t permitted.”
| Sidecar | Watches | CSI calls it drives | Feature it enables |
|---|---|---|---|
external-provisioner |
PVC / PV | CreateVolume, DeleteVolume |
Dynamic provisioning |
external-attacher |
VolumeAttachment | ControllerPublishVolume |
Attach/detach to nodes |
external-snapshotter |
VolumeSnapshotContent | CreateSnapshot, DeleteSnapshot |
Snapshots |
external-resizer |
PVC (spec change) | ControllerExpandVolume |
Online/offline resize |
node-driver-registrar |
(none) | NodeGetInfo registration |
Kubelet plugin registration |
livenessprobe |
(none) | Probe |
Health endpoint |
The mental model: the controller sidecars run as a Deployment (often leader-elected, replica 2+), and they only make controller-plane calls to your cloud’s storage API. The node side does the actual NodeStageVolume / NodePublishVolume mount work and is where filesystem resize physically happens. Snapshotter and resizer are optional — a driver can ship without them, which is the first thing to check before you debug for an hour.
# Which sidecars is your driver actually running?
kubectl -n kube-system get deploy,daemonset -l app.kubernetes.io/name=aws-ebs-csi-driver
kubectl -n kube-system get pod -l app=ebs-csi-controller -o jsonpath='{.items[0].spec.containers[*].name}'
# Expect: ebs-plugin csi-provisioner csi-attacher csi-snapshotter csi-resizer liveness-probe
2. Install the snapshot CRDs and controller (they ship outside core Kubernetes)
This trips up nearly everyone. VolumeSnapshot, VolumeSnapshotContent, and VolumeSnapshotClass are not part of core Kubernetes. They live in the external-snapshotter project and consist of two pieces you must install yourself unless your managed control plane already did it:
- The three CRDs.
- The snapshot-controller — a cluster-wide controller (one per cluster) that handles the common, vendor-independent snapshot logic and binds
VolumeSnapshottoVolumeSnapshotContent. This is distinct from the per-drivercsi-snapshottersidecar.
# Pin to a release tag — never apply from a moving branch in production.
SNAP_VERSION=v8.2.0
BASE=https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAP_VERSION}
# 1. CRDs
kubectl apply -f ${BASE}/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f ${BASE}/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f ${BASE}/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
# 2. The shared snapshot-controller (RBAC + Deployment)
kubectl apply -f ${BASE}/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f ${BASE}/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
Managed clusters differ. EKS requires you to install both CRDs and controller (the EBS CSI add-on ships only the sidecar). AKS and GKE install the controller and CRDs for you on recent versions. Run
kubectl get crd | grep snapshotbefore assuming anything. Mismatched CRDapiVersionbetween controller and sidecar is a classic cause of snapshots that stayreadyToUse: falseforever.
3. Define a VolumeSnapshotClass and take an application-consistent snapshot
A VolumeSnapshotClass is to snapshots what a StorageClass is to volumes: it names the driver and sets a deletion policy. Retain keeps the underlying cloud snapshot when the Kubernetes object is deleted; Delete garbage-collects it.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: ebs-snapclass
driver: ebs.csi.aws.com
deletionPolicy: Retain
parameters:
# Driver-specific. EBS tags the snapshot; useful for cost allocation and Velero.
tagSpecification_1: "Name=k8s-csi-snapshot"
Now the crisp distinction that matters in production: CSI does not freeze your application. A snapshot is crash-consistent — equivalent to pulling the power cord. For a database that is usually recoverable via WAL replay, but “usually” is not a backup strategy. For application consistency you quiesce the workload around the snapshot. The dependable pattern is a brief flush-and-lock:
# Application-consistent snapshot of a Postgres PVC.
# 1. Checkpoint so the on-disk state is current, then snapshot immediately.
kubectl exec -it postgres-0 -- psql -U postgres -c "CHECKPOINT;"
cat <<'EOF' | kubectl apply -f -
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: postgres-snap-2026-05-30
namespace: data
spec:
volumeSnapshotClassName: ebs-snapclass
source:
persistentVolumeClaimName: data-postgres-0
EOF
For stricter guarantees, use a brief filesystem freeze (fsfreeze -f / -u) or the engine’s hot-backup mode (pg_backup_start / pg_backup_stop) bracketing the kubectl apply. The snapshot call returns in milliseconds — the cloud copy-on-write happens afterward — so the lock window is short. Watch the object until it reports ready:
kubectl -n data get volumesnapshot postgres-snap-2026-05-30 \
-o jsonpath='{.status.readyToUse} {.status.restoreSize}{"\n"}'
# true 20Gi
4. Restore a PVC from a snapshot, and clone a live volume
Restore and clone are the same mechanism: a dataSource on a fresh PVC. The provisioner sees the reference and calls CreateVolume with a source instead of provisioning empty.
Restore from a snapshot — point dataSource at the VolumeSnapshot:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-postgres-restored
namespace: data
spec:
storageClassName: ebs-sc
dataSource:
name: postgres-snap-2026-05-30
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi # must be >= snapshot restoreSize
Clone a live PVC — point dataSource at the source PersistentVolumeClaim (no snapshot needed):
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-postgres-clone
namespace: data
spec:
storageClassName: ebs-sc
dataSource:
name: data-postgres-0
kind: PersistentVolumeClaim
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
Two hard constraints that will bite you: the source and destination PVC must use the same StorageClass and (for most drivers) the same volume binding topology — you cannot clone an us-east-1a volume into a PVC that resolves to us-east-1b. And cloning copies live blocks, so the clone is crash-consistent against an active writer. Quiesce the source if you need the clone to be coherent.
5. Enable allowVolumeExpansion and resize online
Set one field on the StorageClass and you unlock expansion. Note: this is one-way — you can grow a volume, never shrink it.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc
provisioner: ebs.csi.aws.com
allowVolumeExpansion: true # the line that enables resize
volumeBindingMode: WaitForFirstConsumer
parameters:
type: gp3
csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
To resize, edit the PVC’s spec.resources.requests.storage upward. The external-resizer calls ControllerExpandVolume to grow the backing disk, then the node plugin grows the filesystem. Online resize (no pod restart) is supported by modern drivers on ext4 and xfs when the ExpandInUsePersistentVolumes feature is active — which it is by default on supported versions.
# Grow from 20Gi to 50Gi, in place, pod stays running.
kubectl -n data patch pvc data-postgres-0 --type merge \
-p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'
# Watch the two-phase progression via conditions.
kubectl -n data get pvc data-postgres-0 \
-o jsonpath='{.status.capacity.storage} | {range .status.conditions[*]}{.type}={.status} {end}{"\n"}'
# 20Gi | Resizing=True
# ...then...
# 50Gi | (conditions cleared, capacity updated)
If you see the condition FileSystemResizePending, the cloud disk grew but the node-side filesystem grow hasn’t completed — for older drivers this clears on the next pod restart. Most current drivers do it live.
6. Topology-aware provisioning: keep volumes and pods in the same zone
Block storage in the cloud is zonal. An EBS volume in us-east-1a cannot attach to a node in us-east-1b, full stop. If the scheduler places your pod in 1b but the provisioner already cut the volume in 1a, the pod is wedged forever. The fix has two parts.
volumeBindingMode: WaitForFirstConsumer (shown above) is the critical one. It tells the provisioner not to create the volume until a pod is scheduled, so the volume is cut in the zone the scheduler actually picked. The default Immediate mode provisions eagerly and is the single most common cause of unschedulable stateful pods across zones. Use WaitForFirstConsumer for any zonal block storage.
allowedTopologies narrows the candidate zones — useful when you must keep volumes out of a zone (capacity, compliance, or a paired-AZ requirement):
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc-zoned
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
type: gp3
allowedTopologies:
- matchLabelExpressions:
- key: topology.ebs.csi.aws.com/zone
values:
- us-east-1a
- us-east-1b
The topology key is driver-specific (topology.ebs.csi.aws.com/zone, topology.gke.io/zone, etc.) — copy it from the driver’s CSINode object, do not assume topology.kubernetes.io/zone. For a StatefulSet that must spread one replica per zone, pair this StorageClass with topologySpreadConstraints on the pod template so the scheduler distributes replicas and the provisioner follows.
7. Back it up properly: Velero with CSI snapshot data movement
Native CSI snapshots are fast but they usually live in the same account and region as the source disk — that is not disaster recovery, it is a fast undo. Velero closes the gap. Its CSI support takes a VolumeSnapshot and, with the data mover (GA since Velero 1.14), copies the snapshot’s contents to object storage in another region via a Kopia/Restic uploader, decoupling your backup from the cloud snapshot’s lifecycle.
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.10.0 \
--bucket velero-prod-backups \
--backup-location-config region=us-west-2 \
--use-node-agent \
--features=EnableCSI \
--snapshot-location-config region=us-east-1
# Back up a namespace and MOVE snapshot data to the bucket (cross-region durable).
velero backup create data-2026-05-30 \
--include-namespaces data \
--snapshot-move-data \
--wait
--snapshot-move-data is the flag that turns an in-region CSI snapshot into a portable, cross-region backup object. Restores then provision fresh PVCs from that data, which also lets you restore into a different cluster or region — the real test of any backup.
Verify
Run these end-to-end before you trust the setup. Each line proves one feature actually works rather than merely being configured.
# Snapshot CRDs + controller present
kubectl get crd | grep snapshot.storage.k8s.io # 3 CRDs
kubectl -n kube-system get deploy snapshot-controller # 1/1 ready
# Snapshot reaches readyToUse
kubectl -n data get volumesnapshot -o wide
# Restored PVC binds and the cloud reports the right size
kubectl -n data get pvc data-postgres-restored # STATUS=Bound
# Expansion actually grew the filesystem inside the pod
kubectl -n data exec postgres-0 -- df -h /var/lib/postgresql/data # shows 50G
# Topology landed the volume in the pod's zone (they MUST match)
kubectl get pv -o custom-columns=\
'NAME:.metadata.name,ZONE:.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0]'
kubectl get pod postgres-0 -o jsonpath='{.spec.nodeName}{"\n"}'
If a snapshot is stuck on readyToUse: false, describe its VolumeSnapshotContent and read the status.error — that is where the driver’s real message surfaces, not on the VolumeSnapshot.
Enterprise scenario
A fintech platform team ran a 40-node multi-tenant Postgres fleet on EKS across three AZs, provisioned by the EBS CSI driver with the default Immediate binding mode inherited from an old StorageClass. It worked until a regional capacity event in us-east-1a forced the cluster autoscaler to bring up replacement nodes only in 1b and 1c. New StatefulSet replicas scheduled onto the surviving zones — but the provisioner, in Immediate mode, had already cut their PVCs in 1a minutes earlier, so the volumes could not attach. Roughly a third of the fleet’s restarting pods wedged in Pending with node(s) had volume node affinity conflict, and the team could not fail over.
The root cause was binding mode, not capacity. They switched the StorageClass to WaitForFirstConsumer so the volume is provisioned only after the scheduler commits a pod to a node, guaranteeing zone co-location, and constrained placement with allowedTopologies plus a per-zone spread constraint. They also moved their hourly snapshots to Velero with --snapshot-move-data into us-west-2, so a zone or region event no longer stranded both the data and its backup in the same place.
# The one-line change that prevents cross-zone attach deadlock.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc-zoned
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer # was: Immediate
allowVolumeExpansion: true
parameters:
type: gp3
Because StorageClass fields are immutable, the migration was create-new-class plus rolling replacement of the StatefulSets onto it — not an in-place edit. Plan that window.
Troubleshooting: stuck attachments, finalizers, and orphans
These are the recurring failure modes, in order of how often they page.
-
Volume won’t detach after a node dies. The
VolumeAttachmentlingers because the node never confirmed the unmount. Inspect it:kubectl get volumeattachment | grep <pv-name>. If the node is genuinely gone, the attacher’s 6-minute force-detach timer normally clears it. Only if it is truly stuck, remove the finalizer manually — and confirm the disk is detached in the cloud console first, or you risk a multi-attach corruption:kubectl patch volumeattachment <name> --type merge \ -p '{"metadata":{"finalizers":[]}}' -
PVC stuck
Terminating. Akubernetes.io/pvc-protectionfinalizer means a pod still references it. Find the holder before deleting:kubectl describe pvc <name>lists mounting pods underUsed By. Do not strip the finalizer to force it — delete the consuming pod instead, or you orphan the PV. -
Orphaned PV / leaked cloud disk. With
reclaimPolicy: Retain, deleting the PVC leaves the PVReleasedand the cloud disk billed. Reconcile periodically: listReleasedPVs, and cross-checkRetainsnapshots from aRetainVolumeSnapshotClass against what you expect — those are the quiet line items that inflate the storage bill.kubectl get pv --field-selector status.phase=Released -
Snapshot
readyToUse: falseforever. Almost always the snapshot-controller is missing, or CRD/controller versions mismatch the sidecar. Re-check section 2, then readstatus.error.messageon theVolumeSnapshotContent.