Servers Security

Building a Linux Audit Trail with auditd and eBPF Runtime Visibility

When a host is compromised, the question your responders ask is never “was the firewall on” — it is “what did the process do, as which user, to which file, and where did the bytes go.” Answering that needs a tamper-evident record of kernel-level activity, and on Linux that record has two complementary sources. auditd is the kernel’s own audit subsystem: authoritative, syscall-accurate, and the only thing a FIPS or PCI assessor accepts as the system of record. eBPF tooling layers on top: low-overhead, context-rich, and able to resolve the container ID and Kubernetes pod that the raw audit log knows nothing about. This guide builds both, wires them together, and ships the result somewhere you can query it.

If you take one thing away: auditd is your compliance system of record and eBPF is your detection layer. They are not redundant. The audit log proves what the kernel saw; the eBPF layer tells you what it meant — which container, which image, which network peer — at a fraction of the per-event cost.

1. The auditd architecture: kernel, daemon, dispatcher

The Linux audit subsystem is three pieces, and confusing them is the source of most misconfiguration:

Confirm the daemon and version before you touch anything:

sudo auditctl -s          # status: enabled, pid, backlog, lost, rate limits
auditctl -v               # audit userspace version (expect 3.x on current distros)
sudo systemctl status auditd

A note that trips people up: on RHEL, auditd is deliberately not restartable with systemctl restart auditd — the unit is masked from restart to avoid losing the netlink subscription mid-flight. Use service auditd restart, or reload rules without restarting the daemon at all, which is what you will do 99% of the time.

The auditctl -s output is the single most important diagnostic. Two fields matter:

2. Authoring syscall, file watch, and execve rules

Rules come in two flavours. File watches (-w) trigger when a path is accessed with given permissions. Syscall rules (-a) trigger when a specific system call is made, optionally filtered by arguments. You can load them live with auditctl, but anything you want to survive a reboot belongs in a file under /etc/audit/rules.d/, which augenrules compiles into the active rule set.

Start with the structure of the rules directory. Files are concatenated in lexical order, so the naming convention is numeric prefixes:

ls /etc/audit/rules.d/
# 10-base-config.rules   30-custom.rules   99-finalize.rules

File watches

Watch the files whose modification is always suspicious — the identity store, the sudoers policy, the SSH and PAM configuration:

# /etc/audit/rules.d/30-custom.rules
## Identity and authorisation
-w /etc/passwd      -p wa -k identity
-w /etc/shadow      -p wa -k identity
-w /etc/group       -p wa -k identity
-w /etc/sudoers     -p wa -k privesc
-w /etc/sudoers.d/  -p wa -k privesc

## Remote access and auth stack
-w /etc/ssh/sshd_config -p wa -k sshd
-w /etc/pam.d/          -p wa -k pam

-p wa audits write and attribute changes (mode, owner, xattrs) but not reads — auditing reads of /etc/passwd would bury you in noise since every getent touches it. The -k key is a free-text tag; it is the single most useful field you will set, because it is how you slice the log later with ausearch -k.

Syscall rules

Syscall rules are where the real coverage lives. The canonical example is auditing privilege escalation — any setuid/setgid family call that succeeds and was made by a real user (UID >= 1000), which is the signature of a user gaining root:

## Privilege escalation: successful setuid by an unprivileged user
-a always,exit -F arch=b64 -S setuid  -F auid>=1000 -F auid!=unset -F exit=0 -k privesc
-a always,exit -F arch=b64 -S setgid  -F auid>=1000 -F auid!=unset -F exit=0 -k privesc

## Loading/unloading kernel modules (classic rootkit vector)
-a always,exit -F arch=b64 -S init_module,finit_module,delete_module -k modules

## Time changes (anti-forensics: rolling the clock to confuse timelines)
-a always,exit -F arch=b64 -S adjtimex,settimeofday,clock_settime -k time-change

Decode the syntax, because every token is load-bearing:

Token Meaning
-a always,exit Add a rule on the exit of the syscall, always recording
-F arch=b64 Filter to 64-bit syscalls. You must also add a b32 rule if 32-bit binaries can run, or an attacker evades you by using the 32-bit ABI
-S setuid The syscall name (multiple comma-separated names allowed on one rule)
-F auid>=1000 The audit UID — the login identity, immutable across su/sudo. This is the field that survives privilege drops
-F auid!=unset Exclude kernel threads and daemons started before login (audit UID of 4294967295)
-F exit=0 Only successful calls

The auid (audit/login UID) is the most important concept in audit rule design. Real UID changes when you sudo; auid does not. It is set at login and is the only reliable way to attribute a root action back to the human who initiated the session.

execve: the process-execution firehose

Auditing every process execution gives you a complete command-line history of the box. It is also the highest-volume rule you will write, so treat it deliberately:

## Every program execution, both ABIs, attributed to the login user
-a always,exit -F arch=b64 -S execve -F auid>=1000 -F auid!=unset -k exec
-a always,exit -F arch=b32 -S execve -F auid>=1000 -F auid!=unset -k exec

Filtering on auid>=1000 here is the difference between a usable log and a 50 GB/day fire hose: it drops the thousands of executions from systemd, cron, and package management that you do not care about, and keeps the interactive activity that you do.

Load the rules and verify they compiled:

sudo augenrules --load          # compile rules.d/ into the active set, persistently
sudo auditctl -l                # list the loaded rules (should match your files)

3. Mapping rules to a baseline and reducing noise

Do not hand-write a rule set from scratch. Two maintained baselines exist and both are correct starting points:

The practical workflow is to copy the baseline, then subtract noise — never add coverage you cannot afford to read. The two ordering rules that govern the entire file:

  1. -D first, finalize last. Start with -D (delete all rules) in 10-base-config.rules and end with -e 2 in 99-finalize.rules. -e 2 makes the rule set immutable until reboot — an attacker with root cannot then unload your auditing without a reboot, which itself is an audited, noisy event.
  2. First match wins for exclusions. Audit evaluates rules top-to-bottom and stops at the first match. To suppress noise, put an exclude rule before the broad rule that would otherwise catch it.

The single most effective noise reduction is the exclude filter, which drops whole record types before they are ever written. The textbook case is CWD (current-working-directory) records, which double the volume of every execve event for marginal forensic value:

## Drop the noisiest record types entirely
-a always,exclude -F msgtype=CWD
-a always,exclude -F msgtype=CONFIG_CHANGE -F auid=unset

For service accounts that legitimately make audited syscalls in a tight loop, exclude that specific actor rather than disabling the rule globally:

## A backup agent that legitimately walks the filesystem all night
-a never,exit -F arch=b64 -F auid=991 -F dir=/srv/backup -k backup-exclude

Resist the urge to fix volume by raising auid thresholds or deleting rules wholesale. Every rule you drop is a control an assessor will ask you to justify. Use targeted exclude/never rules with a -k key so the suppression itself is documented and greppable.

4. Parsing records with ausearch and aureport

The raw audit.log is deliberately machine-oriented — a single logical event is split across multiple lines sharing an event ID, with values hex-encoded. Never grep it directly. The two tools that exist for this are ausearch (find and reassemble events) and aureport (summarise).

ausearch reassembles the multi-line event and, with -i, interprets the raw values — resolving UIDs to names, syscall numbers to names, and hex-encoded paths back to text:

# Everything tagged 'privesc' in the last 24h, interpreted, human time
sudo ausearch -k privesc -i --start recent

# All execve events for a specific login user since boot
sudo ausearch -k exec -ui 1000 -i --start boot

# Anything touching a specific file, by full path
sudo ausearch -f /etc/shadow -i

A reassembled, interpreted execve event looks like this — note that auid survived the sudo, naming the human responsible:

type=PROCTITLE ... proctitle=cat /etc/shadow
type=SYSCALL ... arch=x86_64 syscall=execve success=yes exit=0
  ppid=4120 pid=4188 auid=alice uid=root gid=root
  euid=root ... comm="cat" exe="/usr/bin/cat" key="exec"

aureport turns the log into ranked summaries — the report you run at the start of an investigation to find where to point ausearch:

sudo aureport --summary -i              # overall event counts by type
sudo aureport --auth --summary -i       # authentication attempts, pass/fail
sudo aureport -x --summary -i           # executables, ranked by execution count
sudo aureport --failed -i               # everything that returned an error

Chain them: aureport finds the anomalous executable; you pivot to the raw events with ausearch. For ad-hoc filtering, ausearch --format csv or --format text produces output you can pipe into awk or load into a notebook without writing a log parser.

5. Why eBPF complements auditd

auditd is authoritative but has structural limits that matter at scale:

eBPF closes all three gaps. A program attached to a kprobe or tracepoint runs in the kernel, filters and aggregates before any data crosses to userspace, and carries the full process and cgroup context — exactly where container and pod identity live. The result is an order-of-magnitude lower cost for high-cardinality signals like every exec and every connection, plus enrichment the audit subsystem cannot provide.

The division of labour is clean: keep auditd as the compliance-grade, file-and-privesc record (low volume, high assurance, immutable), and let eBPF own the high-volume runtime signals (exec, connect, file-open by container) where its cost profile and context win.

6. Deploying an eBPF runtime tool

Two production-grade options dominate. Falco (CNCF graduated) is the incumbent — a rules engine over syscalls with a large community rule set. Tetragon (from Cilium) is the newer entrant, built on the same eBPF foundation as Cilium, with first-class Kubernetes identity and in-kernel enforcement. I will show Tetragon for the Kubernetes-native shape and Falco for the rules-driven host shape; pick one.

Tetragon

Tetragon ships as a DaemonSet and, with zero policy, immediately emits enriched process-execution events:

helm repo add cilium https://helm.cilium.io
helm install tetragon cilium/tetragon -n kube-system

# Stream enriched exec/exit events, decoded
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact

You extend it with a TracingPolicy — a CRD that attaches eBPF probes to kernel functions or syscalls and filters in-kernel. This one observes writes to sensitive files and, critically, runs the filter in the kernel so userspace only ever sees the matches:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "monitor-sensitive-writes"
spec:
  kprobes:
  - call: "security_file_permission"
    syscall: false
    args:
    - index: 0
      type: "file"
    - index: 1
      type: "int"
    selectors:
    - matchArgs:
      - index: 0
        operator: "Prefix"
        values:
        - "/etc/passwd"
        - "/etc/shadow"
      - index: 1
        operator: "Equal"
        values:
        - "4"          # MAY_WRITE

Hooking the LSM hook security_file_permission rather than the raw write syscall is the correct choice: it fires regardless of how the write was issued and gives you the resolved file argument directly. Tetragon can additionally take a matchActions of Sigkill to enforce in-kernel, but treat enforcement as a separate, carefully-staged rollout — start in observe-only.

Falco

Falco is the better fit when you want a portable rules engine on bare hosts. Install it with the modern eBPF driver (no kernel module, no compilation):

# Run the official image with the CO-RE eBPF probe
docker run --rm -i -t \
  --privileged \
  -v /proc:/host/proc:ro \
  -v /etc:/host/etc:ro \
  -e FALCO_DRIVER=modern_ebpf \
  falcosecurity/falco:latest

A Falco rule is a condition over syscall fields plus an output template. This one alerts on a shell spawned inside a container — the textbook “interactive process in a thing that should be immutable” signal:

- rule: Terminal shell in container
  desc: A shell was spawned by a non-shell program in a container
  condition: >
    spawned_process and container
    and shell_procs and proc.tty != 0
    and not user_expected_terminal_shell_in_container_conditions
  output: >
    Shell spawned in container
    (user=%user.name container=%container.id
     image=%container.image.repository proc=%proc.cmdline)
  priority: WARNING
  tags: [container, shell, mitre_execution]

Falco’s value is the %container.id, %container.image.repository, and %k8s.pod.name fields — the enrichment auditd cannot produce — rendered straight into the alert.

7. Correlating process, network, and file events

A single event is rarely the alert. The signal is the sequence: a web process spawns a shell, the shell reads /etc/shadow, then opens an outbound connection to an unknown IP. Neither auditd nor a raw eBPF stream alerts on that chain by itself — you need a correlation layer keyed on a stable join field.

That join field is the process lineage: pid plus ppid, anchored by auid (from audit) or the cgroup/container ID (from eBPF). Both Tetragon and Falco emit the parent chain, which lets a downstream rule express the chain directly. The Falco condition for the canonical “reverse-shell precursor” — a network tool launched by a database or web server — looks like:

- rule: Unexpected outbound connection from server process
  desc: A long-running service process initiated an outbound connection
  condition: >
    outbound and proc.pname in (nginx, postgres, mysqld)
    and not fd.sip in (allowed_egress_ips)
  output: >
    Outbound from service process
    (proc=%proc.name parent=%proc.pname
     dest=%fd.sip:%fd.sport container=%container.id)
  priority: CRITICAL

The pattern that scales is not to encode every correlation rule at the agent. Emit richly-attributed atomic events (each carrying pid/ppid/container.id/auid) from both auditd and eBPF, ship them to a central store, and run the stateful, multi-event correlation there — where you have the full timeline, can join across hosts, and can update detection logic without redeploying to 5,000 nodes.

8. Shipping to a SIEM and tuning for performance

For auditd, the dispatcher plugin is the shipping mechanism. To forward every record to syslog (and onward to your collector), drop a plugin config under /etc/audit/plugins.d/:

# /etc/audit/plugins.d/syslog.conf
active = yes
direction = out
path = /sbin/audisp-syslog
type = always
args = LOG_LOCAL6
format = string

Then point rsyslog/vector at local6 and forward off-box. For native log shipping, vector or fluent-bit both have a dedicated audit-log source that reassembles the multi-line records for you — strongly prefer that over tailing audit.log with a generic file input, which will hand your SIEM half-events.

Tune auditd itself so it never silently drops records under load. In /etc/audit/auditd.conf:

max_log_file = 50               # MB per file
num_logs = 10                   # keep 10 rotations on disk
max_log_file_action = ROTATE
space_left = 1024               # MB; below this, run space_left_action
space_left_action = SYSLOG      # warn, do not halt, on low disk
disk_full_action = SUSPEND      # stop auditing, do NOT kill the host
flush = INCREMENTAL_ASYNC       # async writes; the right perf/safety balance

And raise the kernel backlog so a burst does not cause lost events — set it in 99-finalize.rules before the -e 2 immutability line:

## Kernel audit tuning (place before -e 2)
-b 8192                         # backlog_limit: in-kernel queue depth
--backlog_wait_time 0           # do not throttle syscalls waiting on backlog
-e 2                            # make the rule set immutable until reboot

disk_full_action is a genuine policy decision with no free answer. SUSPEND stops auditing but keeps the host serving — the right call for most workloads. HALT stops the host when it can no longer audit — mandated by some high-assurance regimes (and the reason an unmonitored space_left once took down a payments cluster). Choose deliberately and document why.

For eBPF, the performance lever is ring-buffer sizing and in-kernel filtering. Filter as early as possible (in the TracingPolicy selector or Falco condition) so events are dropped in the kernel, not shipped and discarded in userspace. Watch each agent’s drop counter — Falco exposes falco.n_drops, Tetragon exports event and map-pressure metrics on its Prometheus endpoint. A growing drop count means your buffers are too small for the event rate, and your detection has the same kind of holes a non-zero auditd lost does.

Verify

Prove the whole pipeline end to end, from rule load to enriched alert.

# 1. auditd is healthy and NOT dropping records
sudo auditctl -s
#    -> enabled 2 (immutable), lost 0, backlog well under backlog_limit

# 2. Rules are loaded and immutable
sudo auditctl -l | head
#    -> your rules; 'enabled 2' above confirms immutability

# 3. Trigger a watched event and confirm capture + attribution
sudo touch /etc/sudoers.d/zz-test && sudo rm /etc/sudoers.d/zz-test
sudo ausearch -k privesc -i --start recent | tail -20
#    -> SYSCALL records with your auid, not just uid=root

# 4. Confirm aureport sees the executable activity
sudo aureport -x --summary -i | head

# 5. eBPF layer is emitting enriched events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact | head
#    -> process_exec events carrying pod/container identity

# 6. Neither agent is dropping under load
#    auditd: 'lost 0' from step 1
#    falco:  curl -s localhost:8765/metrics | grep n_drops   # if metrics enabled

If step 1 shows lost > 0 or step 6 shows a climbing drop count, stop and fix buffers before trusting any detection — a control with gaps is worse than no control, because it manufactures false confidence.

Enterprise scenario

A payments platform team ran the CIS Level 2 audit baseline across roughly 4,000 RHEL 9 nodes, half of them container hosts running a high-throughput order-matching service. The baseline included unfiltered execve and connect syscall auditing. Within a week of rollout, two things broke: the matching service’s p99 latency rose by 11% under peak load, and auditctl -s was reporting lost events in the tens of thousands per hour on the busiest nodes — meaning the very SOX-mandated trail they had deployed had holes precisely when it mattered most.

The root cause was structural: the synchronous netlink-plus-write path could not keep up with a service making millions of short-lived connect() calls per minute, so the kernel both throttled syscalls (the latency hit) and overran the 64-deep default backlog (the dropped records).

The constraint they could not relax was the compliance requirement itself — process execution and outbound connections had to be auditable on those exact hosts.

The fix was to split the workload across the two layers along the cost boundary. They kept auditd for the file, identity, and privesc rules — low volume, high assurance, immutable — and removed the broad execve and connect syscall rules from auditd entirely. Those high-cardinality signals moved to Tetragon, where filtering happens in the kernel before any userspace crossing, and the events arrive already tagged with the pod and image. To keep auditd honest under the residual load, they raised the backlog and made the disk-full behaviour explicit rather than defaulting to a host halt:

# 99-finalize.rules — tuned, with execve/connect moved to the eBPF layer
-b 16384                        # backlog sized for the residual file/privesc load
--backlog_wait_time 0           # never throttle a syscall on a full backlog
-e 2                            # immutable until reboot

Result: auditd lost returned to zero, p99 latency recovered to within 1% of the pre-rollout baseline, and the auditors accepted the design because both required signals remained continuously captured and attributable — execution and egress via Tetragon’s enriched stream, identity and privilege changes via the immutable kernel audit log. The lesson generalises: when a compliance baseline costs you availability, the answer is rarely to weaken the control — it is to move the expensive signal to the layer built to carry it cheaply.

Checklist

linuxauditdebpfsecurityobservability

Comments

Keep Reading