Networking Kubernetes

Cilium and eBPF Network Policy: L3-L7 Segmentation and Hubble Flow Visibility

Kubernetes network policy on a stock cluster is a stack of compromises. The native NetworkPolicy object is L3/L4 only – it cannot say “allow GET but not DELETE,” it cannot match a destination by DNS name, and the moment a pod gets a new IP your policy is reasoning about a label selector that the kube-proxy iptables backend has flattened into a linear chain of rules that grows with every service. At a few thousand services that chain becomes a measurable latency and a CPU sink, and when a packet gets dropped you have no native way to learn why. Cilium replaces both halves of that problem: an eBPF dataplane that enforces policy on a stable pod identity rather than an IP, and Hubble, a flow-observability layer that tells you the exact policy verdict for every packet. This walkthrough installs Cilium in kube-proxy-replacement mode, builds a default-deny posture, layers L3 through L7 and FQDN-based egress policy on top, and uses Hubble to prove the result.

1. Why eBPF beats iptables for policy enforcement

kube-proxy in iptables mode translates every Service into a set of rules in the nat table. Packet matching there is O(n) in the number of rules – the kernel walks the chain top to bottom for each new connection. Each Service adds rules; each endpoint adds more. On a large cluster this chain reaches tens of thousands of rules, and the per-connection walk shows up as setup latency and softirq CPU. ipvs mode improves the lookup to a hash, but you still carry the conntrack and rule-management overhead, and policy enforcement still happens on IP tuples.

Cilium attaches eBPF programs at the network device and socket layers. Service translation and policy verdicts are hash lookups in eBPF maps, effectively O(1) regardless of cluster size, and for in-cluster traffic Cilium can perform the load-balancing translation at the socket layer (connect() time) so the packet never carries a service VIP into the network at all. The decisive difference for policy: Cilium does not enforce on IP. Every pod is assigned a numeric security identity derived from its labels, and that identity travels with the packet (in the IP option or VXLAN/Geneve header). A policy verdict is “does identity A allow identity B on this port,” resolved once and cached – which is exactly why the same policy keeps working when a pod restarts onto a new IP.

The mental model shift: stop thinking “allow 10.0.3.0/24 to 10.0.4.7:443” and start thinking “allow identity app=frontend to identity app=api on 443.” The IP is an implementation detail Cilium manages; the identity is the policy.

2. Install Cilium in kube-proxy-replacement mode

Replacing kube-proxy is the highest-leverage step. On a fresh cluster, bring the nodes up without kube-proxy (kubeadm init --skip-phases=addon/kube-proxy, or set the equivalent in your managed-cluster bootstrap). Then point Cilium at the API server directly, because with no kube-proxy there is no in-cluster kubernetes Service VIP to reach it through:

# Install the Cilium CLI (verifies and templates the Helm install for you)
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all \
  https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
# API_SERVER_IP / PORT are your control-plane endpoint, since there is no kube-proxy
cilium install \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=${API_SERVER_IP} \
  --set k8sServicePort=${API_SERVER_PORT} \
  --set routingMode=tunnel \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

Validate that the dataplane is healthy and that kube-proxy replacement is actually active before you trust it with policy:

cilium status --wait
# Look for: KubeProxyReplacement:  True
kubectl -n kube-system exec ds/cilium -- cilium-dbg status | grep KubeProxyReplacement

If you are migrating a live cluster rather than building fresh, do not flip this in place – see the migration section. Confirm end-to-end connectivity with the built-in suite, which spins up a set of client/server pods and exercises pod-to-pod, pod-to-service, and policy paths:

cilium connectivity test

3. CiliumNetworkPolicy vs native NetworkPolicy

Cilium enforces the upstream NetworkPolicy object faithfully, so existing policies keep working. But the native object tops out at L4 and IP/label selectors. CiliumNetworkPolicy (CNP, cilium.io/v2) adds the capabilities that make segmentation real: identity-based endpointSelector, toEntities for well-known peers (world, cluster, host, kube-apiserver), DNS-aware toFQDNs, and L7 rules for HTTP, Kafka, and DNS. A side-by-side on the same intent:

Capability NetworkPolicy CiliumNetworkPolicy
L3/L4 by pod label Yes Yes
Match by stable identity (survives IP change) Indirectly Yes (native)
L7 HTTP method/path No Yes
Kafka / DNS protocol rules No Yes
Egress to FQDN (toFQDNs) No Yes
Cluster-wide (no namespace) No Yes (CiliumClusterwideNetworkPolicy)
Node/host firewall No Yes (host policy)

Crucially, CNP and native policy compose under Kubernetes’ additive allow semantics: with multiple policies selecting a pod, the union of their ingress/egress allows applies, and anything not allowed is denied. There is no rule-ordering or priority to reason about, unlike iptables – a packet is allowed if any selecting policy permits it.

4. Default-deny per namespace, then build allows from observed traffic

The first policy in a namespace should lock it down. An empty CiliumNetworkPolicy that selects everything (endpointSelector: {}) but specifies the Ingress/Egress policy types with no allow rules puts every pod in that namespace into default-deny for that direction:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: default-deny
  namespace: payments
spec:
  endpointSelector: {}
  ingress:
    - {}
  egress:
    - {}

The - {} under each is an empty rule that selects no peers – it denies all ingress and egress while engaging the policy engine for those directions. (Selecting a pod with an ingress rule at all is what flips it from “default allow” to “default deny” for ingress; the empty rule is the explicit, readable way to say so.)

Do not author the allow rules from memory. Turn on policy audit mode so Cilium reports what would be dropped without actually dropping it, run real traffic, and read the verdicts out of Hubble:

# Per-endpoint audit: log denied flows as "would-be-dropped" but let them pass
CILIUM_POD=$(kubectl -n kube-system get pods -l k8s-app=cilium \
  -o jsonpath='{.items[0].metadata.name}')
ENDPOINT_ID=$(kubectl -n kube-system exec $CILIUM_POD -- \
  cilium-dbg endpoint list -o jsonpath='{[0].id}')
kubectl -n kube-system exec $CILIUM_POD -- \
  cilium-dbg endpoint config $ENDPOINT_ID PolicyAuditMode=Enabled
# Watch what the default-deny WOULD drop, with full identity context
hubble observe --namespace payments --verdict AUDIT -f

Every line is a flow you must consciously allow or leave blocked. Promote the legitimate ones into explicit rules and turn audit mode off. Here a frontend may reach the API on 8080, and the API may reach Postgres on 5432 – nothing else moves:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-allow
  namespace: payments
spec:
  endpointSelector:
    matchLabels:
      app: api
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
  egress:
    - toEndpoints:
        - matchLabels:
            app: postgres
      toPorts:
        - ports:
            - port: "5432"
              protocol: TCP

Default-deny is only safe if you also allow DNS egress. Pods that cannot reach kube-dns fail every name resolution and look “broken” in ways unrelated to your real policy. Allow kube-system/kube-dns on UDP/TCP 53 in the same pass – and you will want an L7 DNS rule there anyway for FQDN policy (next section).

5. L7 policy: HTTP method/path, Kafka, and gRPC at the sidecar-free proxy

This is the capability native policy cannot touch. Cilium runs an embedded Envoy proxy inside the agent – no per-pod sidecar, no injection – and transparently redirects L7-scoped traffic to it for parsing. You add a rules block under toPorts, and Cilium only forwards requests that match.

Restrict the frontend to read-only HTTP against the API: allow GET /api/v1/orders and the health endpoint, drop everything else (including any POST/DELETE) with an explicit 403 from the proxy:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-http-l7
  namespace: payments
spec:
  endpointSelector:
    matchLabels:
      app: api
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: "GET"
                path: "/api/v1/orders"
              - method: "GET"
                path: "/healthz"

gRPC is HTTP/2, so you express gRPC method authorization with the same http rules – the path is /<package>.<Service>/<Method>:

          rules:
            http:
              - method: "POST"
                path: "/payments.PaymentService/GetStatus"

Kafka gets first-class L7 enforcement – you can authorize specific API keys and topics so a producer cannot consume, or a client cannot touch a topic it has no business reading:

          rules:
            kafka:
              - role: "produce"
                topic: "payment-events"
              - role: "consume"
                topic: "payment-events-dlq"

When the proxy denies an L7 request, the connection is not reset at L4 – it completes the TCP handshake and the application gets a protocol-level rejection (HTTP 403, a Kafka authorization error). That is a feature: callers get a clean, debuggable error instead of a mysterious timeout, and Hubble records the L7 verdict with the method and path.

6. Egress control with FQDN policies and toFQDNs DNS interception

Allowing egress to “the internet” by CIDR is hopeless – destinations move and you cannot allowlist a SaaS API by IP. Cilium solves this by intercepting DNS at L7: a toFQDNs rule works only if the same policy also allows DNS to a resolver with an L7 dns rule. Cilium snoops those DNS responses, learns the IPs the name currently resolves to, and programs exactly those IPs into the egress allow – so the policy tracks DNS as it changes.

This pairing is mandatory and the most common thing people get wrong. The DNS rule both permits resolution and feeds the IP learning:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-egress-fqdn
  namespace: payments
spec:
  endpointSelector:
    matchLabels:
      app: api
  egress:
    # 1. Allow + intercept DNS so toFQDNs can learn the answers
    - toEndpoints:
        - matchLabels:
            k8s:io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP
            - port: "53"
              protocol: TCP
          rules:
            dns:
              - matchPattern: "*.stripe.com"
              - matchName: "api.stripe.com"
    # 2. Now allow egress only to those learned FQDNs, on 443
    - toFQDNs:
        - matchName: "api.stripe.com"
        - matchPattern: "*.stripe.com"
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP

The dns matchPattern/matchName in step 1 governs which names the pod may even resolve; toFQDNs in step 2 governs which resolved IPs it may then connect to. If a name is not allowed in the DNS rule, Cilium never sees its answer and toFQDNs can never permit it – a name you forgot to allow simply fails closed. Inspect what Cilium has learned for an endpoint:

kubectl -n kube-system exec ds/cilium -- \
  cilium-dbg fqdn cache list | grep stripe

7. ClusterwideNetworkPolicy and the host firewall

Some controls are not namespace-scoped. A baseline that must apply to every namespace – “no pod may egress to RFC1918 ranges outside the cluster,” or “all pods may always reach kube-dns” – belongs in a CiliumClusterwideNetworkPolicy (CCNP), which has no namespace field and selects across the whole cluster:

apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: allow-dns-cluster-wide
spec:
  endpointSelector: {}
  egress:
    - toEndpoints:
        - matchLabels:
            k8s:io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP
            - port: "53"
              protocol: TCP

Cilium can also firewall the nodes themselves, not just pods. The host firewall enforces policy on the host network namespace, letting you lock down node ports (kubelet 10250, etcd, SSH) with the same CRD model. Enable it, then select the host with the reserved node label and a nodeSelector:

cilium config set enable-host-firewall true   # or set hostFirewall.enabled via Helm
apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: lock-down-nodes
spec:
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: ""
  ingress:
    - fromEntities:
        - cluster
    - fromCIDR:
        - 10.20.0.0/16     # management/bastion range allowed to SSH
      toPorts:
        - ports:
            - port: "22"
              protocol: TCP

Host policies are default-deny the instant any host policy selects a node, exactly like pod policies. Apply a host firewall in audit mode first (PolicyAuditMode on the host endpoint) – a too-tight host policy can sever kubelet from the API server and cordon the node out from under you.

8. Hubble: tracing a dropped packet to the exact policy decision

The native stack can drop a packet and tell you nothing. Hubble’s entire reason to exist is to answer “why was this dropped, and which policy decided it.” Enable it (done in the install above) and open a port-forward, or use the CLI directly against the relay:

cilium hubble enable --ui           # if not enabled at install
cilium hubble port-forward &        # exposes the relay locally on :4245
hubble status                       # confirm flows are being collected

Watch only the denials, with both endpoints’ identities and the drop reason:

hubble observe --verdict DROPPED -f \
  --namespace payments -o compact
# Example line:
# payments/frontend-xxxx -> payments/api-yyyy  http-request  DROPPED  (Policy denied)

For an L7 verdict you get the method and path that was rejected, which makes “my GET works but my POST 403s” a five-second diagnosis instead of a packet-capture expedition:

hubble observe --namespace payments --protocol http --verdict DROPPED \
  --http-method POST -o json | jq '.l7.http'

You can pivot on identity directly. To see everything a specific pod tried and what happened to it:

hubble observe --pod payments/api-yyyy --last 200

The Hubble UI (cilium hubble ui) renders the same data as a live service map: a green line is an allowed flow, a red line is a drop, and clicking the flow shows the verdict and the policy. For an on-call engineer, the service map is the fastest way to spot “this new policy broke a dependency we forgot about” – the broken edge turns red in real time.

9. Migrating from kube-proxy without an outage

You cannot safely delete kube-proxy and install Cilium on a running cluster in one step – there is a window where neither is fully programming service translation and connections break. Cilium supports a controlled migration via kubeProxyReplacement together with a per-node label gate so the eBPF dataplane and kube-proxy coexist while you cut over node by node:

# 1. Install/upgrade Cilium with kube-proxy replacement, but gated to nodes
#    carrying a label, so existing kube-proxy keeps running everywhere else.
helm upgrade --install cilium cilium/cilium --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=${API_SERVER_IP} \
  --set k8sServicePort=${API_SERVER_PORT} \
  --set nodeSelector."io\.cilium\.migration/a-node"="cilium-after-migration"
# 2. Cut over one node: cordon/drain, label it so Cilium takes the dataplane,
#    restart the agent, then uncordon. Validate, then proceed to the next node.
NODE=worker-1
kubectl cordon $NODE
kubectl drain $NODE --ignore-daemonsets --delete-emptydir-data
kubectl label node $NODE --overwrite io.cilium.migration/a-node=cilium-after-migration
kubectl -n kube-system delete pod -l k8s-app=cilium --field-selector spec.nodeName=$NODE
kubectl -n kube-system rollout status ds/cilium
kubectl uncordon $NODE

Once every node is labeled and validated, remove the original kube-proxy DaemonSet and the nodeSelector gate so Cilium owns service translation cluster-wide:

kubectl -n kube-system delete ds kube-proxy
# Clear stale iptables rules kube-proxy left behind, on each node:
kubectl -n kube-system exec ds/cilium -- \
  nsenter -t 1 -m -- bash -c 'iptables-save | grep -v KUBE | iptables-restore'

Verify

Prove the policy does what you claim – both the allow and the drop – with live traffic, not by reading YAML.

# 1. Dataplane is in kube-proxy-replacement mode and healthy
cilium status --wait | grep KubeProxyReplacement   # -> True

# 2. The default-deny works: an unlabeled pod canNOT reach the API
kubectl -n payments run probe --rm -it --image=nicolaka/netshoot --restart=Never -- \
  curl -m 3 http://api:8080/api/v1/orders        # expect: timeout (no allow)

# 3. The L7 allow works: a frontend pod CAN GET but cannot POST
kubectl -n payments exec deploy/frontend -- curl -s -o /dev/null -w "%{http_code}\n" \
  http://api:8080/api/v1/orders                  # expect: 200
kubectl -n payments exec deploy/frontend -- curl -s -o /dev/null -w "%{http_code}\n" \
  -X POST http://api:8080/api/v1/orders          # expect: 403 (proxy denied)

# 4. FQDN egress: allowed name connects, an un-allowed name fails closed
kubectl -n payments exec deploy/api -- curl -s -o /dev/null -w "%{http_code}\n" \
  https://api.stripe.com                          # expect: 200/3xx
kubectl -n payments exec deploy/api -- curl -m 3 https://example.com  # expect: timeout

# 5. Hubble confirms the verdicts with identities and the drop reason
hubble observe --namespace payments --verdict DROPPED --last 50 -o compact
hubble observe --namespace payments --protocol http --http-method POST --last 20

If step 2 returns 200, your default-deny is not engaged – check that an ingress rule actually selects the API pod. If step 3’s POST returns 200, the L7 rule is not redirecting to the proxy – confirm the http block is under toPorts.rules, not at the wrong nesting level. Hubble in step 5 is the source of truth: if a flow you expected to drop shows FORWARDED, some other policy is allowing it (additive semantics), and cilium-dbg policy selectors list on the endpoint will show you which.

Enterprise scenario

A payments platform team ran a PCI-scoped cluster where a QSA finding was blunt: the cardholder-data service could egress to the internet on 443 with no restriction, because the only available control was a native NetworkPolicy, which is IP/port-only. The service legitimately needed to reach exactly one external dependency – the card processor’s tokenization API at *.cardprocessor.com – and nothing else. The team could not allowlist by IP: the processor published a wide, frequently-rotating set of addresses behind their CDN, and any static CIDR allow was stale within a week and too broad to pass audit. Their stopgap, a default-deny with a 0.0.0.0/0 egress hole on 443, was precisely what got flagged.

The fix was a toFQDNs egress policy paired with L7 DNS interception, which let them allowlist the dependency by name and let Cilium track the rotating IPs automatically. They scoped it to the cardholder-data pods only, allowed DNS resolution for just that domain, and connected on 443 solely to the learned answers:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: chd-egress-processor-only
  namespace: cde
spec:
  endpointSelector:
    matchLabels:
      app: cardholder-data
  egress:
    - toEndpoints:
        - matchLabels:
            k8s:io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP
          rules:
            dns:
              - matchPattern: "*.cardprocessor.com"
    - toFQDNs:
        - matchPattern: "*.cardprocessor.com"
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP

They shipped it in policy audit mode for 48 hours, watched hubble observe --namespace cde --verdict AUDIT to confirm the only external destinations the service actually reached were *.cardprocessor.com (catching one forgotten metrics endpoint in the process, which they added explicitly), then enforced. For the audit evidence they exported hubble observe --namespace cde --type drop -o json showing every attempt to any other destination being dropped with a policy verdict, plus cilium-dbg fqdn cache list proving the allow tracked the processor’s rotating IPs. The QSA closed the finding: egress was now restricted to a named dependency, the evidence was per-flow and continuous, and no human had to chase a CIDR list. The native NetworkPolicy simply could not have expressed it.

Checklist

KubernetesCiliumeBPFNetwork PolicyHubbleObservability

Comments

Keep Reading