Containerization Containers

Production-Grade AKS: Networking, Ingress, and Observability

A demo az aks create gets you a cluster; it does not get you a production cluster. This guide covers the four decisions that actually matter at scale: networking model, identity, ingress, and observability — with the commands and manifests to implement each.

Production AKS architecture

1. Networking: choose Azure CNI Overlay

Three models, one right default for most:

Model Pod IPs When
kubenet NAT’d, not routable legacy; avoid
Azure CNI (classic) from VNet subnet when pods must be directly routable, but burns VNet IPs fast
Azure CNI Overlay private overlay CIDR default — VNet-scale without exhausting subnet IPs

Provision with Overlay + Cilium data plane (eBPF) via Terraform:

resource "azurerm_kubernetes_cluster" "prod" {
  name                = "aks-prod-eus"
  location            = "eastus"
  resource_group_name = azurerm_resource_group.aks.name
  dns_prefix          = "aksprod"
  oidc_issuer_enabled       = true   # required for workload identity
  workload_identity_enabled = true

  default_node_pool {
    name                 = "system"
    vm_size              = "Standard_D4ds_v5"
    auto_scaling_enabled = true
    min_count            = 3
    max_count            = 6
    only_critical_addons_enabled = true   # taint system pool; run apps elsewhere
  }

  network_profile {
    network_plugin      = "azure"
    network_plugin_mode = "overlay"
    network_policy      = "cilium"
    network_data_plane  = "cilium"
    pod_cidr            = "10.244.0.0/16"
    service_cidr        = "10.0.0.0/16"
    dns_service_ip      = "10.0.0.10"
  }
  identity { type = "SystemAssigned" }
}

Add a separate user node pool for workloads so system add-ons never compete with apps:

az aks nodepool add -g rg-aks --cluster-name aks-prod-eus \
  --name apps --mode User --node-vm-size Standard_D8ds_v5 \
  --enable-cluster-autoscaler --min-count 3 --max-count 20

2. Identity: Workload Identity, not secrets

Stop mounting service-principal secrets. Microsoft Entra Workload Identity federates a Kubernetes service account to a managed identity — pods get Entra tokens with no secrets to rotate.

# 1) create a user-assigned managed identity and give it access (e.g., Key Vault)
az identity create -g rg-aks -n id-payments
az role assignment create --assignee <id-client-id> \
  --role "Key Vault Secrets User" --scope <keyvault-resource-id>

# 2) federate it to the k8s service account
az identity federated-credential create --name payments-fed \
  --identity-name id-payments -g rg-aks \
  --issuer "$(az aks show -g rg-aks -n aks-prod-eus --query oidcIssuerProfile.issuerUrl -o tsv)" \
  --subject system:serviceaccount:payments:payments-sa
apiVersion: v1
kind: ServiceAccount
metadata:
  name: payments-sa
  namespace: payments
  annotations:
    azure.workload.identity/client-id: "<id-client-id>"
---
apiVersion: apps/v1
kind: Deployment
metadata: { name: payments, namespace: payments }
spec:
  template:
    metadata:
      labels: { azure.workload.identity/use: "true" }   # injects the token
    spec:
      serviceAccountName: payments-sa

3. Ingress: managed app routing + cert-manager

Enable the AKS-managed NGINX ingress (app routing add-on) so you don’t operate the controller yourself:

az aks approuting enable -g rg-aks -n aks-prod-eus

Then expose a service with TLS (cert-manager issuing Let’s Encrypt certs):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: payments
  namespace: payments
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: webapprouting.kubernetes.azure.com
  tls:
    - hosts: ["pay.kloudvin.com"]
      secretName: pay-tls
  rules:
    - host: pay.kloudvin.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend: { service: { name: payments, port: { number: 80 } } }

4. Network policy: default-deny

Lock down east-west traffic. With Cilium/Azure network policy, start every namespace at default-deny and open only what’s needed:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny, namespace: payments }
spec:
  podSelector: {}
  policyTypes: ["Ingress", "Egress"]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: allow-from-ingress, namespace: payments }
spec:
  podSelector: { matchLabels: { app: payments } }
  ingress:
    - from:
        - namespaceSelector: { matchLabels: { kubernetes.azure.com/managed: "true" } }
      ports: [{ port: 80 }]

5. Observability: managed Prometheus + Grafana

Don’t self-host the metrics stack. Enable Azure Monitor managed Prometheus, Azure Managed Grafana, and Container Insights in one shot:

az aks update -g rg-aks -n aks-prod-eus \
  --enable-azure-monitor-metrics \
  --enable-azure-monitor-app-monitoring

# link a managed Grafana instance
az grafana create -g rg-aks -n graf-kloudvin
az aks update -g rg-aks -n aks-prod-eus --enable-azure-monitor-metrics \
  --grafana-resource-id $(az grafana show -g rg-aks -n graf-kloudvin --query id -o tsv)

Scrape your own app metrics by annotating pods and adding a PodMonitor:

apiVersion: azmonitoring.coreos.com/v1
kind: PodMonitor
metadata: { name: payments, namespace: payments }
spec:
  selector: { matchLabels: { app: payments } }
  podMetricsEndpoints: [{ port: metrics, interval: 30s }]

The four signals to alert on first (the “golden signals”): latency, traffic, errors, saturation. Wire alerts in Grafana → Azure Monitor action groups.

Enterprise scenario

A fintech platform team migrated ~40 services onto a shared AKS cluster using classic Azure CNI. It worked in staging, then production node scale-out started failing with InsufficientSubnetSize. Classic CNI assigns every pod a real VNet IP, and with maxPods=110 per node a single /22 node subnet was exhausted well before the autoscaler hit its ceiling — the subnet, not compute, was the bottleneck. Re-IPing a peered hub-and-spoke VNet that other teams depended on was a non-starter.

The fix was migrating the data plane to Azure CNI Overlay, where pods draw from a private overlay CIDR and only nodes consume VNet IPs. This is an in-place cluster update, not a rebuild:

az aks update -g rg-aks -n aks-prod-eus \
  --network-plugin-mode overlay \
  --pod-cidr 10.244.0.0/16

Two gotchas bit them. First, the migration is one-way and requires draining and recycling every node, so they ran it during a window with PodDisruptionBudgets in place. Second, a legacy service depended on directly routable pod IPs from an on-prem caller; overlay pod IPs are not reachable outside the cluster, so that path had to move behind the internal load balancer. After the cutover the same /22 comfortably supported the full node count, and VNet IP consumption dropped from thousands of pod IPs to a few dozen node IPs — buying years of headroom without touching the hub network. Validate the new plane with kubectl get nodes -o wide and confirm pod CIDRs land in the overlay range.

Production readiness checklist

Verify

kubectl get nodes -o wide                          # node pools healthy
kubectl get networkpolicy -A                       # default-deny present
kubectl describe sa payments-sa -n payments        # workload-identity annotation
kubectl get ingress -n payments                    # address assigned, TLS secret created
az aks show -g rg-aks -n aks-prod-eus --query azureMonitorProfile  # metrics enabled

With these five pillars in place you have a cluster that’s routable at VNet scale, secret-free, TLS-terminated, segmented, and observable. Everything else — GitOps with Argo CD, progressive delivery, service mesh — layers cleanly on top.

KubernetesAKSNetworkingObservabilityHelmPrometheusIngress

Comments

// part 1 of 2 · AKS in Production

Keep Reading