“Multi-tenancy” on Kubernetes is not one decision; it is a slider. At one end you give every tenant a namespace and trust RBAC. At the other you give every tenant their own cluster and trust nothing shared. Every option in between trades blast radius against cost and operational toil. The mistake teams make is picking a point on that slider once, globally, and then discovering six months in that their “tenants” are actually three different populations — internal app teams, a partner integration, and an untrusted CI workload — each of which needs a different isolation tier. This guide treats tenancy as a tiered design: pick the weakest model that satisfies the threat model for each tenant class, and make the boundary enforceable rather than aspirational.
1. Pick a tenancy model per tenant class, not per cluster
Kubernetes gives you a namespace as the unit of naming and RBAC, but a namespace is emphatically not a security boundary on its own. Pods in different namespaces share the same kernel, the same nodes, the same CNI, and — critically — the same API server. The model you choose decides which of those you actually isolate.
| Model | Isolates API server | Isolates control plane CRDs | Kernel/node isolation | Cost per tenant | Good for |
|---|---|---|---|---|---|
| Namespace-per-tenant (“soft”) | No | No | No | Near zero | Trusted internal teams |
| Hierarchical namespaces (HNC) | No | No | No | Near zero | Org structure, policy inheritance |
| Virtual cluster (vcluster) | Yes | Yes | No (shared host nodes) | Low | Tenants needing CRDs/their own API |
| Cluster-per-tenant | Yes | Yes | Yes | High | Untrusted or compliance-bound tenants |
The decisive questions, in order:
- Does the tenant need to install CRDs or cluster-scoped resources? If yes, a shared namespace is out — CRDs are global, and two tenants wanting different versions of the same CRD will collide. This pushes you to vcluster or a dedicated cluster.
- Is the tenant trusted not to attempt kernel-level escape? If no — untrusted code, partner workloads, anything internet-facing and high-risk — soft isolation is insufficient regardless of policy, because a container escape lands on a shared node. This pushes you toward sandboxed runtimes (Section 6) or dedicated nodes/clusters.
- Do tenants need to see each other’s objects at all? Soft isolation leaks API-server-level metadata (node names, events, sometimes other namespaces depending on RBAC). vcluster gives each tenant a syntactically complete, separate Kubernetes API.
The principal-level framing: soft multi-tenancy is a cost optimization, not a security control. It is correct for tenants you would already trust on a shared cluster. The moment “tenant” means “someone I do not trust,” you are choosing between a sandbox runtime and a separate cluster, and you should price both before assuming soft tenancy is cheaper.
2. Resource governance: ResourceQuota, LimitRange, and priority fairness
Before isolation, solve sharing. The first failure mode in any shared cluster is not a breach — it is one tenant scheduling 400 pods and starving everyone else. Three objects fix this, and they are not interchangeable.
ResourceQuota caps the aggregate a namespace may consume. LimitRange constrains individual objects and supplies defaults so pods without explicit requests do not slip through the quota accounting.
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-quota
namespace: tenant-acme
spec:
hard:
requests.cpu: "20"
requests.memory: 64Gi
limits.cpu: "40"
limits.memory: 128Gi
pods: "100"
services.loadbalancers: "2"
count/persistentvolumeclaims: "20"
requests.storage: 500Gi
A ResourceQuota that constrains requests.cpu or limits.memory has a sharp edge: once it is in force, every pod in the namespace must declare the corresponding request/limit, or admission rejects the pod. Tenants will not do this reliably. LimitRange backstops them with defaults and floors:
apiVersion: v1
kind: LimitRange
metadata:
name: tenant-defaults
namespace: tenant-acme
spec:
limits:
- type: Container
default: # applied as limit if pod omits one
cpu: "500m"
memory: 256Mi
defaultRequest: # applied as request if pod omits one
cpu: "100m"
memory: 128Mi
max:
cpu: "4"
memory: 8Gi
min:
cpu: "50m"
memory: 32Mi
Quota stops a tenant from consuming too much in total; it does not guarantee that under contention their pods win against another tenant’s. For that you need scheduling priority. Define a per-tier PriorityClass and pin tenant pods to it so a platinum tenant preempts a free-tier batch job rather than queueing behind it:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: tenant-platinum
value: 100000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "Production tenants; may preempt free-tier/batch."
Crucially, also quota the priority classes themselves with a scoped ResourceQuota, or a free-tier tenant will simply set priorityClassName: tenant-platinum and defeat the whole scheme:
apiVersion: v1
kind: ResourceQuota
metadata:
name: deny-high-priority
namespace: tenant-free-001
spec:
hard:
pods: "0" # zero pods allowed at this priority...
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["tenant-platinum"]
API-server fairness is the third axis. A tenant hammering the API with list/watch loops can degrade the control plane for everyone. API Priority and Fairness (APF, GA since 1.29) lets you carve API concurrency into queues. Bound a noisy tenant’s service accounts to a low-share PriorityLevelConfiguration via a FlowSchema so their requests cannot crowd out control-plane traffic.
3. Hierarchical Namespace Controller for policy inheritance
Flat namespaces force you to copy RBAC, quotas, and NetworkPolicies into every tenant by hand, and they drift. The Hierarchical Namespace Controller (HNC) — a kubernetes-sigs project — adds parent/child relationships so policy propagates down a tree. A tenant gets a root namespace; their environments become subnamespaces that inherit the tenant’s RoleBindings automatically.
# Install the HNC CRDs and controller (pin a real release tag in production)
HNC_VERSION=v1.1.0
kubectl apply -f https://github.com/kubernetes-sigs/hierarchical-namespaces/releases/download/${HNC_VERSION}/default.yaml
# Install the kubectl-hns plugin for ergonomics
kubectl krew install hns
Create a tenant root, then self-service subnamespaces beneath it. A SubnamespaceAnchor in the parent is the request; HNC creates the actual namespace and enforces that it cannot outlive or escape its parent:
kubectl create namespace tenant-acme
kubectl hns create acme-dev -n tenant-acme # creates subnamespace, parented to tenant-acme
kubectl hns create acme-stage -n tenant-acme
kubectl hns tree tenant-acme
# tenant-acme
# ├── acme-dev
# └── acme-stage
Now anything you place in tenant-acme — a RoleBinding granting the tenant’s group edit, a default NetworkPolicy, a LimitRange — propagates into every child. HNC marks propagated objects and prevents children from deleting them. You control what propagates per type:
# Propagate RoleBindings and LimitRanges down every tree; do NOT propagate ResourceQuotas
# (you usually want per-namespace quotas, not an inherited one)
kubectl hns config set-resource rolebindings --mode Propagate
kubectl hns config set-resource limitranges --mode Propagate
kubectl hns config set-resource resourcequotas --mode Remove
HNC’s gain is operational, not security: it does not isolate the API server or the kernel. It eliminates the drift and copy-paste that make flat soft-tenancy unmaintainable past a few dozen namespaces, and it gives tenants a safe self-service primitive (create a subnamespace) without cluster-admin.
4. Virtual clusters with vcluster: API isolation and CRD freedom
When a tenant needs their own Kubernetes API — to install CRDs, run their own operators, define cluster-scoped RBAC, or pin a different API behavior — a shared namespace cannot deliver it. A virtual cluster (vcluster, by LoftLabs) runs a real, lightweight Kubernetes control plane (API server + controller-manager backed by an embedded datastore such as SQLite or an external one) inside a single host namespace. The tenant’s API server is genuinely separate; their pods, however, are synced down and scheduled on the host cluster’s nodes, so you keep one pool of compute.
# Install the vcluster CLI, then create a virtual cluster inside a host namespace
vcluster create acme --namespace tenant-acme-vc
# Connect: this opens a kubeconfig context pointed at the tenant's *own* API server
vcluster connect acme --namespace tenant-acme-vc
kubectl get namespaces # tenant sees only THEIR namespaces, not the host's
kubectl apply -f some-crd.yaml # tenant installs CRDs freely; isolated to their vcluster
What the tenant sees is a clean cluster. What actually happens: the vcluster syncer translates the tenant’s pods, services, secrets, and configmaps into the host namespace (rewriting names to avoid collisions) and schedules them on host nodes. CRDs, RBAC, and most cluster-scoped objects live only in the virtual control plane and never touch the host API.
Pin the host-side blast radius with a values.yaml that disables host node visibility and constrains what the syncer is allowed to do:
# vcluster.yaml — install with: vcluster create acme -n tenant-acme-vc -f vcluster.yaml
sync:
toHost:
pods:
enabled: true
ingresses:
enabled: true
fromHost:
nodes:
enabled: true
selector:
# tenant's vcluster only "sees" nodes in this pool, for scheduling
labels:
tenant-pool: "acme"
controlPlane:
distro:
k8s:
enabled: true
vcluster’s isolation properties: strong at the API/control-plane layer (separate API server, separate etcd-equivalent, separate CRD and RBAC space), weak at the kernel layer (pods still land on shared host nodes). It is the right tier for tenants you trust at the kernel level but who need API autonomy. For untrusted tenants, combine vcluster with the runtime isolation in Section 6, or do not share nodes at all.
5. Network isolation: default-deny and per-tenant ingress
Tenancy is only as strong as the network boundary, and the default boundary is none — every pod can reach every other pod cluster-wide. Establish a baseline default-deny per tenant namespace (in the soft/HNC model) so a compromised pod in tenant-acme cannot reach tenant-globex:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: tenant-acme
spec:
podSelector: {} # selects every pod in the namespace
policyTypes: [Ingress, Egress]
# no ingress/egress rules => deny both directions
Then add back only intended flows. The two flows every tenant needs: DNS egress to kube-dns, and ingress from the shared ingress controller. Allow them narrowly with a namespace selector rather than opening the namespace up:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns-and-ingress
namespace: tenant-acme
spec:
podSelector: {}
policyTypes: [Ingress, Egress]
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx
Two operational notes that bite teams. First, NetworkPolicy is enforced by the CNI, not the API server; on a CNI that ignores it (e.g., a plain Flannel install) the YAML applies and does nothing — verify enforcement before trusting it. Second, in a soft-tenancy model you cannot let tenants self-define cross-namespace allows, or they will simply allow themselves into a neighbor; for cross-tenant rules, use an admission policy (Kyverno/Gatekeeper) that forbids namespaceSelector references outside the tenant’s own subtree. For per-tenant ingress, give each tenant a hostname/path on a shared controller and isolate TLS with a tenant-scoped Secret, or run a dedicated ingress controller per high-tier tenant if you need full data-plane separation.
6. Runtime and node isolation with gVisor/Kata and dedicated pools
For tenants you do not trust at the kernel level, no amount of NetworkPolicy or RBAC helps: a container escape is a host compromise, and the host is shared. Two mechanisms raise the floor.
Sandboxed runtimes put a barrier between the container and the host kernel. gVisor (runsc) intercepts syscalls in a userspace kernel; Kata Containers run each pod in a lightweight VM. Register the runtime as a RuntimeClass, then opt tenant pods into it:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc # must match the containerd runtime handler configured on nodes
apiVersion: v1
kind: Pod
metadata:
name: untrusted-job
namespace: tenant-partner
spec:
runtimeClassName: gvisor # this pod runs under the sandbox
containers:
- name: app
image: registry.example.com/partner/job:1.4.2
Enforce that every pod in an untrusted tenant’s namespace uses the sandbox with an admission policy, so a tenant cannot omit runtimeClassName and land on the bare runtime. Dedicated node pools complete the picture: taint a pool for the tenant and require their pods (often via the sandboxed RuntimeClass’s scheduling block) to tolerate it, so untrusted workloads never co-locate with platform or other-tenant pods:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor-isolated
handler: runsc
scheduling:
nodeSelector:
tenant-isolation: sandboxed
tolerations:
- key: tenant-isolation
operator: Equal
value: sandboxed
effect: NoSchedule
The honest tradeoff: sandboxes add per-pod overhead and break some workloads (certain syscalls, some CSI drivers, GPU passthrough). They are the correct tier when “tenant” means untrusted code but a full cluster-per-tenant is too expensive. When in doubt for regulated or hostile workloads, separate clusters remain the only model with no shared kernel.
7. Tenant onboarding automation and self-service guardrails
None of the above scales if onboarding a tenant is a ticket. Encode a tenant as a single declarative object and let a controller fan it out into namespace + quota + RBAC + NetworkPolicy + (optionally) a vcluster. The Capsule project (clastix) models this as a Tenant CRD; many platforms build their own. The shape:
apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
name: acme
spec:
owners:
- name: acme-admins
kind: Group
namespaceOptions:
quota: 5 # tenant may self-create up to 5 namespaces
resourceQuotas:
scope: Tenant # quota pooled across all the tenant's namespaces
items:
- hard:
limits.cpu: "40"
limits.memory: 128Gi
networkPolicies:
items:
- policyTypes: [Ingress, Egress]
podSelector: {}
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports: [{ protocol: UDP, port: 53 }]
The guardrail principle: tenants get self-service within a fence. They can create namespaces (up to a cap), deploy workloads, and define their own intra-tenant policy — but they cannot exceed quota, escape their network boundary, schedule onto other tenants’ nodes, or grant themselves cluster-admin. Anything they cannot be trusted to self-serve becomes an admission rule (Kyverno ClusterPolicy / Gatekeeper Constraint) that rejects the violating object at apply time, not a wiki page asking them not to.
Verify
Prove each boundary holds before you onboard a real tenant. Treat green output as the contract.
# 1. Quota is enforced: this should be REJECTED once it exceeds requests.cpu
kubectl -n tenant-acme run quota-test --image=pause \
--overrides='{"spec":{"containers":[{"name":"c","image":"pause",
"resources":{"requests":{"cpu":"50"}}}]}}'
# Expected: error ... exceeded quota: tenant-quota
# 2. Default-deny holds: a pod in tenant-acme cannot reach a pod in tenant-globex
kubectl -n tenant-acme run probe --image=nicolaka/netshoot --rm -it --restart=Never -- \
curl -m 4 http://svc.tenant-globex.svc.cluster.local
# Expected: timeout / connection refused (NOT a 200)
# 3. Priority-class quota holds: a free tenant cannot claim platinum priority
kubectl -n tenant-free-001 run sneaky --image=pause \
--overrides='{"spec":{"priorityClassName":"tenant-platinum"}}'
# Expected: forbidden: exceeded quota: deny-high-priority
# 4. vcluster API isolation: tenant context sees only their namespaces
vcluster connect acme -n tenant-acme-vc -- kubectl get ns
# Expected: default, kube-system, etc. of the VIRTUAL cluster only — never host tenants
# 5. Sandbox is actually in use (run inside the pod scheduled with runtimeClassName: gvisor)
kubectl -n tenant-partner exec untrusted-job -- dmesg 2>&1 | head -1
# Expected under gVisor: "Operation not permitted" — runsc blocks dmesg; bare runc would succeed
Also confirm the CNI enforces policy at all (a no-op CNI silently passes test 2 for the wrong reason): apply a deny-all, then prove a flow that should break actually breaks, before trusting that flows you allowed are the only ones open.
Enterprise scenario
A fintech platform team ran a shared “internal-apps” EKS cluster with namespace-per-team and RBAC — soft tenancy, and fine for years. Then a new requirement landed: a regulated reconciliation product had to onboard a third-party vendor’s batch engine that shipped as a Helm chart bundling its own CRDs and a cluster-scoped operator. Two problems collided. The vendor’s operator wanted cluster-wide CRD installation, which would have been visible to and collidable with every other team. And compliance classified the vendor code as untrusted, forbidding it from sharing a kernel with workloads that touched cardholder data.
A full cluster-per-vendor was the obvious answer and got rejected on cost and lead time — provisioning, patching, and observability for a new cluster per vendor integration was weeks of platform toil they could not absorb per deal. The team split the requirement across two tiers instead. They gave the vendor a vcluster for API/CRD autonomy, so the vendor’s operator and CRDs lived entirely inside the virtual control plane and never touched the host API or other teams. Then they satisfied the kernel-isolation requirement by pinning that vcluster’s pods to a dedicated, tainted node pool running gVisor, enforced with a Kyverno policy that rejected any pod in the synced host namespace lacking the sandbox RuntimeClass:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-sandbox-for-vendor
spec:
validationFailureAction: Enforce
rules:
- name: vendor-pods-must-be-sandboxed
match:
any:
- resources:
kinds: ["Pod"]
namespaces: ["tenant-vendor-vc"] # the host namespace backing the vcluster
validate:
message: "Vendor pods must run under the gvisor-isolated RuntimeClass."
pattern:
spec:
runtimeClassName: "gvisor-isolated"
The result: API autonomy without a new cluster, kernel isolation without trusting the vendor, and a boundary enforced by admission rather than convention. The deciding insight was that “untrusted tenant needing CRDs” is not one tier on the slider — it is two separate boundaries (control-plane and kernel) that you compose, and vcluster-plus-sandbox composed them at a fraction of cluster-per-tenant cost. Six months later they had a repeatable vendor-onboarding pattern: one Tenant object, one vcluster, one sandboxed pool.
Checklist
Pitfalls
- Treating a namespace as a security boundary. It isolates names and RBAC, nothing else. Shared kernel, nodes, CNI, and API server remain. Match the model to the threat, not to the org chart.
- Quota without LimitRange. The first pod a tenant submits without an explicit request gets rejected by quota admission, and they will blame the platform. Always pair them.
- Forgetting to quota the PriorityClass. A
value: 100000priority class is worthless if any tenant can name it. Cap high-priority pod counts per low tier or your fairness scheme is decorative. - Letting tenants write cross-namespace NetworkPolicies. In soft tenancy this lets a tenant allow themselves into a neighbor. Constrain
namespaceSelectorvia admission to the tenant’s own subtree. - Assuming vcluster sandboxes the kernel. It isolates the API/control plane only; pods run on shared host nodes. Compose it with gVisor/Kata or dedicated nodes for untrusted tenants.
- Self-service without a fence. “Tenants can create namespaces” must be bounded (cap, quota, mandatory baseline policy) or you have handed out cluster-admin with extra steps.
Next steps: wire noisy-neighbor detection (per-namespace CPU-throttling and API-request-rate dashboards) and cost showback (label every tenant object with a tenant key and bill from kube-state-metrics usage) into the same pipeline that provisions tenants, so capacity governance and chargeback are byproducts of onboarding rather than a separate quarterly scramble.