Zero-Code Auto-Instrumentation with Grafana Beyla: eBPF Traces and RED Metrics

Every platform team carries a tier of services that will never be instrumented by hand: a Rust proxy nobody owns, a Go binary whose maintainer left, a third-party container you cannot rebuild. The classic answer is “wrap it in a service mesh” — but a mesh only sees what crosses the sidecar, doubles your data plane, and tells you nothing about TLS-terminated traffic inside the pod. Grafana Beyla takes a different route: it attaches eBPF probes to the kernel, watches the syscalls and library calls a process makes, and emits RED metrics and spans over OTLP. No code change, no recompile, no language SDK. This is how it works, how to deploy it, and where its limits are.

1. How Beyla hooks HTTP/HTTPS and gRPC with eBPF

Beyla loads small eBPF programs and attaches them to two kinds of hook points. The first is kprobes/tracepoints on kernel functions — the socket read/write path (tcp_sendmsg, tcp_recvmsg and friends) — so it can see request and response bytes flowing through a connection, parse the HTTP/1.x request line or the HTTP/2 (gRPC) frames, and time the round trip. The second is uprobes on user-space library functions, attached by inspecting a target binary’s symbol table and instrumenting specific offsets.

The uprobe path is what makes encrypted traffic observable. Wire bytes are useless if the service speaks HTTPS, so Beyla hooks SSL_read and SSL_write inside the process’s OpenSSL (libssl), and the equivalents in Go’s crypto/tls. Those functions handle the buffer after decryption on read and before encryption on write, so Beyla reads cleartext at exactly the moment the application does, without ever holding a private key.

Go gets special treatment. Because the Go runtime uses goroutines and its own scheduler rather than 1:1 OS threads, Beyla detects Go binaries (via the ELF build info) and attaches uprobes to specific runtime and net/http symbols so it can follow a request across goroutines and reconstruct the server span correctly. For everything else — Python, Node, Java, Ruby, .NET, Rust, C++ — it works at the generic socket and libssl layer.

The practical consequence: Beyla gives you accurate server-side and client-side RED data and spans for HTTP/1.x, HTTP/2, gRPC, and TLS, in any language, with zero instrumentation. What it does not automatically give you is full cross-service trace context for non-Go services. Hold that thought for Step 4.

A few hard requirements. eBPF needs a reasonably modern kernel — 5.8+ is the comfortable floor, and most managed Kubernetes node images are well past that. The process needs elevated capabilities: at minimum CAP_BPF and CAP_PERFMON, plus CAP_SYS_PTRACE and CAP_NET_RAW for the full feature set. In containers Beyla also needs to share the target’s network and (for some setups) PID namespace, which shapes the deployment model below.

2. Deploy Beyla as a sidecar or a DaemonSet

There are two topologies, and the choice is a real architectural decision, not a detail.

Sidecar — one Beyla container per pod, sharing that pod’s network namespace, instrumenting exactly one application. Strong isolation and clean per-service ownership, at the cost of one extra container in every pod. Use BEYLA_OPEN_PORT to tell it which listening port to attach to.

# Pod spec excerpt: Beyla as a sidecar instrumenting the app on port 8080
spec:
  shareProcessNamespace: true        # Beyla must see the app process
  containers:
    - name: checkout
      image: registry.internal/checkout:1.7.3
      ports:
        - containerPort: 8080
    - name: beyla
      image: grafana/beyla:2.6.0      # pin; never float on :latest
      securityContext:
        privileged: true              # or the fine-grained caps below
      env:
        - name: BEYLA_OPEN_PORT
          value: "8080"
        - name: BEYLA_SERVICE_NAME
          value: "checkout"
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://otel-collector.observability:4317"

If your platform forbids privileged, drop it and grant the minimum set instead:

      securityContext:
        runAsUser: 0
        readOnlyRootFilesystem: true
        capabilities:
          add: ["BPF", "PERFMON", "SYS_PTRACE", "NET_RAW", "CHECKPOINT_RESTORE", "DAC_READ_SEARCH"]

DaemonSet — one Beyla per node, instrumenting many or all processes on that node. This is the high-leverage mode for a platform team: deploy once, get RED metrics for everything, opt services in by selector rather than by editing every Deployment. It needs hostPID: true so it can see processes across the node, and a discovery configuration so it doesn’t blindly instrument the kubelet.

# DaemonSet excerpt: one Beyla per node
spec:
  template:
    spec:
      hostPID: true
      containers:
        - name: beyla
          image: grafana/beyla:2.6.0
          securityContext:
            privileged: true
          env:
            - name: BEYLA_CONFIG_PATH
              value: /config/beyla-config.yml
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://otel-collector.observability:4317"
          volumeMounts:
            - name: config
              mountPath: /config
      volumes:
        - name: config
          configMap:
            name: beyla-config

Target selection in DaemonSet mode lives in the config file under discovery.instrument. You can match by executable name, by open port, or — the idiomatic Kubernetes way — by namespace and pod labels:

# beyla-config.yml
discovery:
  instrument:
    # Instrument everything in these namespaces...
    - k8s_namespace: payments
    # ...and any pod carrying this label, anywhere.
    - k8s_pod_labels:
        instrument: "beyla"
    # Or match a specific binary by name.
    - exe_path: "checkout|api-gateway"

3. Emit RED metrics and spans over OTLP

Beyla speaks native OTLP, so the destination is an OpenTelemetry Collector, not Beyla-specific plumbing. The two pillars it produces:

RED metrics — request Rate, Error rate, and request Duration as a histogram. The headline series are http.server.request.duration and http.client.request.duration (and the rpc.* equivalents for gRPC), following OTel semantic conventions, with attributes like http.request.method, http.response.status_code, and http.route.
Spans — one server span per inbound request (and client spans for outbound calls), carrying timing, status, and the route, exportable as OTLP traces.

Turn both on explicitly. Beyla’s features list controls what it generates:

# beyla-config.yml (continued)
otel_metrics_export:
  endpoint: http://otel-collector.observability:4317
  protocol: grpc
otel_traces_export:
  endpoint: http://otel-collector.observability:4317
  protocol: grpc

# Generate both application RED telemetry and span data.
discovery:
  instrument:
    - k8s_namespace: payments

# What to produce: app-level RED metrics, and traces (spans).
attributes:
  kubernetes:
    enable: true

trace_printer: disabled   # set to text/json only for local debugging

To control telemetry volume, set the histogram buckets and sampling at the source rather than shipping everything and dropping it downstream. Beyla honors the standard OTel sampler variables:

# Tail-friendly: sample 100% at the agent, decide centrally in the Collector.
export OTEL_TRACES_SAMPLER=parentbased_always_on
# Or thin at the source if the Collector is not doing tail sampling:
# export OTEL_TRACES_SAMPLER=parentbased_traceidratio
# export OTEL_TRACES_SAMPLER_ARG=0.05

On the Collector side this is an ordinary OTLP receiver. Nothing about the pipeline is Beyla-aware:

# otel-collector.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
processors:
  batch: {}
exporters:
  otlphttp/tempo:
    endpoint: http://tempo-distributor.observability:4318
  prometheusremotewrite:
    endpoint: http://mimir-nginx.observability/api/v1/push
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/tempo]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheusremotewrite]

If you would rather scrape than push metrics, Beyla can expose a Prometheus endpoint instead of (or alongside) OTLP via prometheus_export.port — handy when your metrics path is already pull-based and you only want to push traces.

4. Decrypt TLS and understand the context-propagation limits

This section separates people who deploy Beyla successfully from people who file confused tickets.

TLS is handled, and handled well. Because Beyla hooks SSL_read/SSL_write (and Go’s crypto/tls) at the library boundary, HTTPS services produce the same clean RED data and spans as plaintext ones. No keys, no MITM, no mesh — the biggest advantage over wire-sniffing approaches, which see only ciphertext.

Automatic trace propagation is the real limit. A distributed trace requires the W3C traceparent header to be injected into outbound requests and read from inbound ones, so the next hop continues the same trace. For Go services, Beyla does this end to end — its Go-specific uprobes read and write the header in the running process, so a Go-to-Go chain stitches into one trace automatically. For non-Go services, Beyla observes the request but generally cannot rewrite outbound headers from the kernel, so each service produces correct local spans that do not automatically link across the hop.

The supported way to bridge this without code changes is Beyla’s own context propagation between Beyla-instrumented services. Enable it deliberately; it is strongest within a Beyla-instrumented estate:

# beyla-config.yml
# Stitch spans across Beyla-instrumented services.
ebpf:
  context_propagation: all   # 'headers' for HTTP only; 'ip' as a fallback

The honest mental model: Beyla gives you excellent per-service RED and per-hop spans for free, and automatic cross-service traces that are cleanest among Go services or within a fully Beyla-instrumented set. Where a request crosses into a service that already speaks W3C Trace Context via an SDK, you want those two worlds to interoperate — the next step.

5. Combine Beyla auto-traces with SDK manual spans

Beyla and the OpenTelemetry SDK compose rather than compete. The pattern in mature estates is Beyla for breadth, SDK for depth: Beyla blankets every service in RED metrics and server spans, and the few services that need business-level detail also run the SDK to add child spans, baggage, and attributes the kernel can never see.

The key to one trace is shared context. Because Beyla emits and (for Go / within its instrumented set) honors W3C traceparent, a downstream SDK-instrumented service finds the header on the inbound request and continues the trace rather than starting a new one — as long as both ends use the W3C propagator (the OTel default):

# On the SDK-instrumented service, keep the default W3C propagator so it
# continues a trace context that Beyla (or another SDK) started.
export OTEL_PROPAGATORS=tracecontext,baggage

Inside that service, the manual span you add nests under the server span Beyla (or the SDK’s own auto-instrumentation) created:

from opentelemetry import trace

tracer = trace.get_tracer("checkout")

def settle_payment(order):
    # This span automatically becomes a child of the active server span.
    with tracer.start_as_current_span("settle_payment") as span:
        span.set_attribute("payment.provider", order.provider)
        span.set_attribute("payment.amount_cents", order.amount_cents)
        return _charge(order)

A practical rule to avoid double-counting: do not run Beyla and an SDK auto-instrumentation agent on the same process and let both emit the server span, or you will get two spans for one request. Pick one source for the entry span per service — usually Beyla for the long tail, the SDK for the few services that already have it — and use the other only to add detail.

6. Service-name discovery, Kubernetes enrichment, and routing

A trace is useless if every service shows up as unknown_service. Beyla resolves service identity in a priority order: an explicit BEYLA_SERVICE_NAME/OTEL_SERVICE_NAME wins; otherwise in Kubernetes it derives the name from the workload (Deployment/StatefulSet/DaemonSet) owning the pod; failing that it falls back to the executable name. For anything you care about, set it explicitly or let the Kubernetes decorator do the work.

Enable Kubernetes metadata enrichment so every metric and span carries k8s.namespace.name, k8s.pod.name, k8s.deployment.name, and node — the attributes your dashboards and routing depend on:

# beyla-config.yml
attributes:
  kubernetes:
    enable: true
    cluster_name: prod-eu-west-1   # otherwise inferred where possible

This needs RBAC — Beyla queries the Kubernetes API to map PIDs to pods:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: beyla
rules:
  - apiGroups: [""]
    resources: ["pods", "services", "nodes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["apps"]
    resources: ["replicasets", "deployments", "daemonsets", "statefulsets"]
    verbs: ["get", "list", "watch"]

The route attribute deserves attention because it is the difference between a usable RED dashboard and a cardinality explosion. Left alone, /orders/12345 and /orders/67890 are distinct, and your http.route label count grows without bound. Give Beyla route patterns so it collapses them into a single low-cardinality route:

# beyla-config.yml
routes:
  patterns:
    - /orders/{id}
    - /users/{id}/cart
  unmatched: heuristic   # auto-group anything not matched, instead of leaving it raw

unmatched: heuristic is the safety net: anything that doesn’t match an explicit pattern gets auto-grouped (numeric and UUID-like segments wildcarded) rather than emitted verbatim. On a public-facing API that one setting is the difference between a few hundred route series and a few hundred thousand.

7. Verify

Confirm each layer end to end before you trust a dashboard.

Beyla is up and attached. Its own logs name the processes it instrumented:

kubectl -n observability logs ds/beyla | grep -i "instrumenting\|found process\|discovered"

Metrics are flowing. Generate a little traffic, then query Prometheus/Mimir for the RED histogram Beyla emits:

# Request rate per route over 5m, from Beyla-produced server metrics.
curl -s "http://mimir-query-frontend.observability/prometheus/api/v1/query" \
  --data-urlencode 'query=sum by (http_route) (rate(http_server_request_duration_seconds_count[5m]))'

You should see series labeled with http_route, service_name, and your k8s_* attributes. If service_name is unknown_service, revisit Step 6.

Spans are flowing and TLS is decrypted. Query Tempo for a recent trace from the service and confirm you get real server spans with the route and status — not empty or ciphertext-derived garbage — even on your HTTPS endpoints:

curl -s "http://tempo-query-frontend.observability:3200/api/search?tags=service.name%3Dcheckout&limit=5"

Cross-service stitching (Go / Beyla estate). Pick a request that fans out and confirm in Tempo that downstream spans share the same trace ID as the entry span. If non-Go hops break the chain, that is expected per Step 4 — add the SDK there or rely on per-hop RED.

Overhead sanity check. Watch the Beyla container’s own CPU and memory; on a busy node it should be modest. If it is climbing, your discovery.instrument selector is almost certainly too broad — tighten it to the namespaces and labels you actually need.

Enterprise scenario

A payments platform team ran a polyglot estate: the customer-facing APIs were Go, but the fraud-scoring and ledger services were a Python monolith and a vendor-supplied JVM container they were contractually forbidden to modify or rebuild. Go services were beautifully traced via the OTel Go SDK into existing Tempo and Mimir — but every trace went dark the instant it crossed into Python or the JVM black box, and the vendor container had no metrics at all. A service mesh was floated and rejected: it would have doubled the data plane and still missed the TLS-terminated, in-pod calls the fraud service made to a local cache.

They deployed Beyla as a DaemonSet scoped to the payments and fraud namespaces, with Kubernetes enrichment and route patterns. Two outcomes landed immediately. The unmodifiable JVM vendor container went from zero signal to full RED metrics and per-request server spans — including its HTTPS calls, decrypted at the libssl boundary with no keys handed over. And the Go APIs kept their rich SDK traces: because both Beyla and the SDK spoke W3C traceparent with the default propagator, Go entry spans and Beyla-observed downstream hops shared a trace ID where the chain was Go-originated.

The one deliberate decision was avoiding double server spans. The Go services already emitted entry spans via their SDK, so for those namespaces they ran Beyla in metrics-only mode and let the SDK own traces, while the Python and JVM namespaces used Beyla for both. The selector that made that split clean:

# beyla-config.yml on the Go-services node pool: RED metrics only,
# let the existing OTel SDK own the spans to avoid duplicate entry spans.
discovery:
  instrument:
    - k8s_namespace: go-apis
otel_metrics_export:
  endpoint: http://otel-collector.observability:4317
  protocol: grpc
# (otel_traces_export intentionally omitted here)

Total time to first useful dashboard for the previously-invisible vendor container: under a day, with no application change and no vendor ticket.

Zero-Code Auto-Instrumentation with Grafana Beyla: eBPF Traces and RED Metrics

1. How Beyla hooks HTTP/HTTPS and gRPC with eBPF

2. Deploy Beyla as a sidecar or a DaemonSet

3. Emit RED metrics and spans over OTLP

4. Decrypt TLS and understand the context-propagation limits

5. Combine Beyla auto-traces with SDK manual spans

6. Service-name discovery, Kubernetes enrichment, and routing

7. Verify

Enterprise scenario

Checklist

Written by Vinod

Comments

Keep Reading

Application Insights with OpenTelemetry: Distributed Tracing and Adaptive Sampling for .NET

Distributed Tracing on AWS with X-Ray: Service Maps, Segments, and ADOT on EKS

Azure Monitor Managed Prometheus and Managed Grafana for AKS, End to End