A demo az aks create gets you a cluster; it does not get you a production cluster. This guide covers the four decisions that actually matter at scale: networking model, identity, ingress, and observability — with the commands and manifests to implement each.
1. Networking: choose Azure CNI Overlay
Three models, one right default for most:
| Model | Pod IPs | When |
|---|---|---|
| kubenet | NAT’d, not routable | legacy; avoid |
| Azure CNI (classic) | from VNet subnet | when pods must be directly routable, but burns VNet IPs fast |
| Azure CNI Overlay | private overlay CIDR | default — VNet-scale without exhausting subnet IPs |
Provision with Overlay + Cilium data plane (eBPF) via Terraform:
resource "azurerm_kubernetes_cluster" "prod" {
name = "aks-prod-eus"
location = "eastus"
resource_group_name = azurerm_resource_group.aks.name
dns_prefix = "aksprod"
oidc_issuer_enabled = true # required for workload identity
workload_identity_enabled = true
default_node_pool {
name = "system"
vm_size = "Standard_D4ds_v5"
auto_scaling_enabled = true
min_count = 3
max_count = 6
only_critical_addons_enabled = true # taint system pool; run apps elsewhere
}
network_profile {
network_plugin = "azure"
network_plugin_mode = "overlay"
network_policy = "cilium"
network_data_plane = "cilium"
pod_cidr = "10.244.0.0/16"
service_cidr = "10.0.0.0/16"
dns_service_ip = "10.0.0.10"
}
identity { type = "SystemAssigned" }
}
Add a separate user node pool for workloads so system add-ons never compete with apps:
az aks nodepool add -g rg-aks --cluster-name aks-prod-eus \
--name apps --mode User --node-vm-size Standard_D8ds_v5 \
--enable-cluster-autoscaler --min-count 3 --max-count 20
2. Identity: Workload Identity, not secrets
Stop mounting service-principal secrets. Microsoft Entra Workload Identity federates a Kubernetes service account to a managed identity — pods get Entra tokens with no secrets to rotate.
# 1) create a user-assigned managed identity and give it access (e.g., Key Vault)
az identity create -g rg-aks -n id-payments
az role assignment create --assignee <id-client-id> \
--role "Key Vault Secrets User" --scope <keyvault-resource-id>
# 2) federate it to the k8s service account
az identity federated-credential create --name payments-fed \
--identity-name id-payments -g rg-aks \
--issuer "$(az aks show -g rg-aks -n aks-prod-eus --query oidcIssuerProfile.issuerUrl -o tsv)" \
--subject system:serviceaccount:payments:payments-sa
apiVersion: v1
kind: ServiceAccount
metadata:
name: payments-sa
namespace: payments
annotations:
azure.workload.identity/client-id: "<id-client-id>"
---
apiVersion: apps/v1
kind: Deployment
metadata: { name: payments, namespace: payments }
spec:
template:
metadata:
labels: { azure.workload.identity/use: "true" } # injects the token
spec:
serviceAccountName: payments-sa
3. Ingress: managed app routing + cert-manager
Enable the AKS-managed NGINX ingress (app routing add-on) so you don’t operate the controller yourself:
az aks approuting enable -g rg-aks -n aks-prod-eus
Then expose a service with TLS (cert-manager issuing Let’s Encrypt certs):
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: payments
namespace: payments
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: webapprouting.kubernetes.azure.com
tls:
- hosts: ["pay.kloudvin.com"]
secretName: pay-tls
rules:
- host: pay.kloudvin.com
http:
paths:
- path: /
pathType: Prefix
backend: { service: { name: payments, port: { number: 80 } } }
4. Network policy: default-deny
Lock down east-west traffic. With Cilium/Azure network policy, start every namespace at default-deny and open only what’s needed:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny, namespace: payments }
spec:
podSelector: {}
policyTypes: ["Ingress", "Egress"]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: allow-from-ingress, namespace: payments }
spec:
podSelector: { matchLabels: { app: payments } }
ingress:
- from:
- namespaceSelector: { matchLabels: { kubernetes.azure.com/managed: "true" } }
ports: [{ port: 80 }]
5. Observability: managed Prometheus + Grafana
Don’t self-host the metrics stack. Enable Azure Monitor managed Prometheus, Azure Managed Grafana, and Container Insights in one shot:
az aks update -g rg-aks -n aks-prod-eus \
--enable-azure-monitor-metrics \
--enable-azure-monitor-app-monitoring
# link a managed Grafana instance
az grafana create -g rg-aks -n graf-kloudvin
az aks update -g rg-aks -n aks-prod-eus --enable-azure-monitor-metrics \
--grafana-resource-id $(az grafana show -g rg-aks -n graf-kloudvin --query id -o tsv)
Scrape your own app metrics by annotating pods and adding a PodMonitor:
apiVersion: azmonitoring.coreos.com/v1
kind: PodMonitor
metadata: { name: payments, namespace: payments }
spec:
selector: { matchLabels: { app: payments } }
podMetricsEndpoints: [{ port: metrics, interval: 30s }]
The four signals to alert on first (the “golden signals”): latency, traffic, errors, saturation. Wire alerts in Grafana → Azure Monitor action groups.
Enterprise scenario
A fintech platform team migrated ~40 services onto a shared AKS cluster using classic Azure CNI. It worked in staging, then production node scale-out started failing with InsufficientSubnetSize. Classic CNI assigns every pod a real VNet IP, and with maxPods=110 per node a single /22 node subnet was exhausted well before the autoscaler hit its ceiling — the subnet, not compute, was the bottleneck. Re-IPing a peered hub-and-spoke VNet that other teams depended on was a non-starter.
The fix was migrating the data plane to Azure CNI Overlay, where pods draw from a private overlay CIDR and only nodes consume VNet IPs. This is an in-place cluster update, not a rebuild:
az aks update -g rg-aks -n aks-prod-eus \
--network-plugin-mode overlay \
--pod-cidr 10.244.0.0/16
Two gotchas bit them. First, the migration is one-way and requires draining and recycling every node, so they ran it during a window with PodDisruptionBudgets in place. Second, a legacy service depended on directly routable pod IPs from an on-prem caller; overlay pod IPs are not reachable outside the cluster, so that path had to move behind the internal load balancer. After the cutover the same /22 comfortably supported the full node count, and VNet IP consumption dropped from thousands of pod IPs to a few dozen node IPs — buying years of headroom without touching the hub network. Validate the new plane with kubectl get nodes -o wide and confirm pod CIDRs land in the overlay range.
Production readiness checklist
Verify
kubectl get nodes -o wide # node pools healthy
kubectl get networkpolicy -A # default-deny present
kubectl describe sa payments-sa -n payments # workload-identity annotation
kubectl get ingress -n payments # address assigned, TLS secret created
az aks show -g rg-aks -n aks-prod-eus --query azureMonitorProfile # metrics enabled
With these five pillars in place you have a cluster that’s routable at VNet scale, secret-free, TLS-terminated, segmented, and observable. Everything else — GitOps with Argo CD, progressive delivery, service mesh — layers cleanly on top.