Servers Security

Mastering systemd: Units, Timers, Resource Control, and Service Hardening

Most people poke systemd with systemctl start and never look further. That is a missed opportunity: a well-written unit file gives you restart policies, dependency ordering, resource caps, and a security sandbox for free — no supervisor daemon, no wrapper scripts, no init hacks. This is a practical walk through production-grade units, replacing cron with timers, fencing services into cgroups, and hardening them with the directives that actually move the needle.

Everything below assumes a modern distribution (systemd v245+, cgroup v2 as the default hierarchy). Check yours with systemctl --version and stat -fc %T /sys/fs/cgroup/ (you want cgroup2fs).

1. Unit anatomy: types, dependencies, and ordering

A unit is an INI-style file. Service units live in three directories, searched in ascending priority:

Path Purpose
/usr/lib/systemd/system/ Shipped by packages. Never edit directly.
/etc/systemd/system/ Your local overrides and custom units. This wins.
/run/systemd/system/ Runtime-generated, volatile.

The Type= in [Service] tells systemd when the unit is “started”. This is the single most misconfigured field:

“Started” is not “ready”. With Type=simple, a unit ordered After= yours may launch while yours is still binding its socket. Use Type=notify or socket activation for genuine readiness gating.

Dependencies vs. ordering

These are orthogonal, and conflating them causes most “works on reboot, breaks on restart” bugs.

You almost always need both. “Start my app after the database is up, and pull the database in if it isn’t” is:

[Unit]
Wants=postgresql.service
After=postgresql.service network-online.target

Note network-online.target, not network.target. The former blocks until the network is actually configured (it requires the relevant wait-online service to be enabled); the latter is only an ordering point for the networking stack. If your service needs a routable address at start, use network-online.target plus Wants=network-online.target.

The [Install] section

[Install] is inert until you run systemctl enable. It defines enable-time behaviour, typically:

[Install]
WantedBy=multi-user.target

enable reads WantedBy=multi-user.target and creates a symlink in multi-user.target.wants/, which is how the unit starts at boot. No [Install] section means enable has nothing to do and the unit never starts automatically — a common gotcha.

2. Writing a production service unit

A complete, opinionated unit for a long-running daemon. Save it as /etc/systemd/system/widgetd.service.

[Unit]
Description=Widget API daemon
Documentation=https://internal.example.com/runbooks/widgetd
Wants=network-online.target
After=network-online.target postgresql.service
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
Type=notify
ExecStart=/usr/local/bin/widgetd --config /etc/widgetd/config.toml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=2
TimeoutStartSec=30
TimeoutStopSec=20
WatchdogSec=30

# Run as an unprivileged, system-managed user
DynamicUser=yes
StateDirectory=widgetd
RuntimeDirectory=widgetd

[Install]
WantedBy=multi-user.target

A few deliberate choices:

Load and start it:

sudo systemctl daemon-reload
sudo systemctl enable --now widgetd.service
systemctl status widgetd.service

daemon-reload is mandatory after any unit change; systemd caches parsed units in memory.

3. Drop-in overrides and templated units

Drop-ins: never edit shipped units

To tweak a vendor unit, do not copy it into /etc. Use a drop-in, merged on top of the original. The clean way:

sudo systemctl edit nginx.service

This opens an editor for /etc/systemd/system/nginx.service.d/override.conf. Add only the deltas:

[Service]
LimitNOFILE=65536
Restart=on-failure
RestartSec=5

Drop-ins survive package upgrades because the base unit stays untouched. To verify the merge:

systemctl cat nginx.service          # shows base unit + all drop-ins
systemctl show nginx.service -p LimitNOFILE

Gotcha: most directives are additive or last-one-wins, but list-valued ones like ExecStart= are not simply overridden. To replace ExecStart=, first clear it with an empty ExecStart= line, then set the new value. Otherwise you get two ExecStart entries.

[Service]
ExecStart=
ExecStart=/usr/local/bin/nginx-wrapper

Templated (instanced) units

A unit file with @ in its name is a template. The string after @ is the instance, available as %i (and %I unescaped). One template, many instances:

/etc/systemd/system/tunnel@.service:

[Unit]
Description=SSH tunnel to %i
After=network-online.target

[Service]
Type=simple
ExecStart=/usr/bin/autossh -M 0 -N %i
Restart=always
RestartSec=10
DynamicUser=yes

[Install]
WantedBy=multi-user.target

Now start any number of instances from the one file:

sudo systemctl enable --now tunnel@db-replica.service
sudo systemctl enable --now tunnel@cache-01.service

Each gets independent state, logs, and lifecycle. Common specifiers: %i (instance), %n (full unit name), %H (hostname), %h (user home for user units). Use systemd-escape to safely build instance names containing slashes or special characters:

systemd-escape --template tunnel@.service "user@10.0.0.5"

4. Replacing cron with systemd timers

Timers beat cron for anything you want to observe: full journald logging, dependency ordering, resource control, and Persistent= to catch up on missed runs. A timer is two units — a .service that does the work and a .timer that triggers it.

/etc/systemd/system/backup.service:

[Unit]
Description=Nightly database backup
Wants=postgresql.service
After=postgresql.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/db-backup.sh
DynamicUser=yes
StateDirectory=db-backup

/etc/systemd/system/backup.timer:

[Unit]
Description=Run database backup nightly

[Timer]
OnCalendar=*-*-* 02:30:00
Persistent=true
RandomizedDelaySec=300
AccuracySec=1m

[Install]
WantedBy=timers.target

Key fields:

Enable the timer, not the service:

sudo systemctl daemon-reload
sudo systemctl enable --now backup.timer

You can also drive a timer with OnBootSec= (after boot) or OnUnitActiveSec= (interval since the service last ran) for “every 15 minutes” schedules:

[Timer]
OnBootSec=15min
OnUnitActiveSec=15min

5. Resource control with cgroup v2

Every service gets its own cgroup. On cgroup v2 you control resources with directives in [Service] (or a drop-in) that map directly onto kernel controllers — no ulimit guesswork.

[Service]
# CPU: cap at 1.5 cores; weight only matters under contention
CPUQuota=150%
CPUWeight=200

# Memory: throttle at 512M, OOM-kill the cgroup at 768M
MemoryHigh=512M
MemoryMax=768M

# Block IO, per device
IOWeight=100
IOWriteBandwidthMax=/dev/sda 20M

# Fork-bomb protection
TasksMax=256

What each one does:

Apply via drop-in to tune without touching the unit, then inspect the live accounting:

sudo systemctl set-property widgetd.service MemoryMax=1G CPUQuota=200%
systemctl show widgetd.service -p MemoryMax -p CPUQuota
systemd-cgtop

set-property writes a drop-in under /etc/systemd/system/widgetd.service.d/ and applies it live. systemd-cgtop is top for cgroups — per-service CPU, memory, and IO, the fastest way to find the noisy neighbour on a box.

Note: accounting must be on for the numbers to be real. Most distros enable MemoryAccounting/CPUAccounting by default, but if systemctl show reports zeros, set MemoryAccounting=yes and CPUAccounting=yes explicitly.

6. Sandboxing and service hardening

This is where systemd quietly replaces a pile of AppArmor/SELinux work for common cases. Add these to [Service]; each line shrinks the blast radius of a compromised process.

[Service]
# Filesystem
ProtectSystem=strict              # entire fs read-only...
ReadWritePaths=/var/lib/widgetd   # ...except these
ProtectHome=true
PrivateTmp=true

# Privilege
NoNewPrivileges=true
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE

# Kernel / device exposure
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
PrivateDevices=true
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
SystemCallFilter=@system-service
SystemCallArchitectures=native
LockPersonality=true
MemoryDenyWriteExecute=true

The high-value ones, in priority order:

systemd ships a scoring tool. After hardening, run systemd-analyze security widgetd.service. It rates each unit from “UNSAFE” to “OK” with an exposure score and a per-directive breakdown. Treat it as a checklist: drive the score down, retest the service after each change.

Verify

Validate everything before declaring victory.

# Unit file syntax and warnings (catches typos, deprecated keys)
systemd-analyze verify /etc/systemd/system/widgetd.service

# Confirm the merged, effective configuration
systemctl cat widgetd.service
systemctl show widgetd.service -p Restart -p MemoryMax -p NoNewPrivileges

# Is it actually running and healthy?
systemctl status widgetd.service
systemctl is-active widgetd.service && echo OK

# Timers: list next/last run times, then dry-run the calendar expression
systemctl list-timers --all
systemd-analyze calendar "*-*-* 02:30:00" --iterations 5

# Trigger a oneshot timer's service manually without waiting for the clock
sudo systemctl start backup.service
journalctl -u backup.service -n 50 --no-pager

# Security posture
systemd-analyze security widgetd.service

systemd-analyze calendar printing five upcoming timestamps is the fastest way to prove an OnCalendar= expression means what you think, and systemd-analyze verify reports problems without starting anything.

Reading journald effectively

Logs are only useful if you can find the signal. journald gives structured, indexed filtering that grep over flat files cannot.

# Follow one unit, like tail -f
journalctl -u widgetd.service -f

# Since last boot, this boot only, errors and worse
journalctl -u widgetd.service -b -p err

# Time-bounded
journalctl -u widgetd.service --since "2026-03-18 09:00" --until "2026-03-18 10:00"

# Everything since the previous boot (-1), useful for crash post-mortems
journalctl -b -1 -p warning

# Disk usage and retention
journalctl --disk-usage
sudo journalctl --vacuum-time=14d

Two configuration points that bite people:

Debugging boot, cycles, and failures

# What failed, and why
systemctl --failed
systemctl status <unit>          # last log lines + exit code/signal
journalctl -u <unit> -b

# Where is boot time going?
systemd-analyze                  # total firmware/loader/kernel/userspace time
systemd-analyze blame            # slowest units, descending
systemd-analyze critical-chain   # the ordering chain that gated boot

# Dependency cycles — systemd breaks them by dropping a unit and tells you which
journalctl -b | grep -i "found ordering cycle"

# Visualize what pulls in what
systemctl list-dependencies <unit>

Reach for systemd-analyze critical-chain when boot is slow: unlike blame (which lists slow units in isolation) it shows the serialized path that actually delayed reaching the target, accounting for parallelism. For ordering cycles, the journal names the units involved and which one systemd deleted to break the loop — fix it by relaxing an After=/Requires= rather than letting systemd choose for you.

Enterprise scenario

A payments platform ran a JVM settlement service under Restart=always with MemoryMax=4G. During a vendor outage the service wedged — threads alive, heap full, GC thrashing, but no exit. The process-alive check stayed green, so nothing restarted; settlements silently backed up for 40 minutes until on-call noticed the queue depth. The gotcha: Restart= only fires on process death, and a deadlocked-but-alive JVM never dies. MemoryMax made it worse — the cgroup OOM killer reaped the service mid-batch, and because Type=simple reported “started” at fork, the restart raced ahead of the readiness probe and took traffic before the DB pool reconnected.

The fix was a true liveness gate via the watchdog, not a process check. They switched to Type=notify, added a dedicated watchdog thread pinging sd_notify only when the work loop and DB pool were healthy, and split the soft/hard memory limits so the JVM could reclaim before being killed:

[Service]
Type=notify
WatchdogSec=20
Restart=on-watchdog
RestartSec=5
StartLimitIntervalSec=120
StartLimitBurst=4
MemoryHigh=3500M
MemoryMax=4G
// ping only when genuinely able to serve
if (workLoop.isHealthy() && dbPool.isReachable()) {
    sdNotify.notifyWatchdog(); // every WatchdogSec/2 = 10s
}

Now a wedge trips the watchdog in 20 seconds and Restart=on-watchdog cycles it, while MemoryHigh lets the heap shed under pressure before MemoryMax ever bites. StartLimitBurst=4 keeps a genuinely broken build from crash-looping into a thundering reconnect storm.

Checklist

Pitfalls and next steps

The recurring failure modes: forgetting daemon-reload after editing a unit (the change silently does nothing); putting StartLimit* in [Service] instead of [Unit] (ignored); using Requires= where Wants= belongs and cascading a non-critical failure into an outage; ordering on network.target when the service needs network-online.target; and over-hardening with SystemCallFilter until the service dies on an unrelated syscall — always test the running service after each hardening directive, not just that it starts.

From here, look at socket activation (.socket units) for on-demand start and zero-downtime restarts, systemd-run --scope/--slice for ad-hoc resource-controlled commands, and per-slice budgets to carve a box into tiers (a system-batch.slice with a hard CPU and memory ceiling so background jobs can never starve the foreground service). Run systemd-analyze security across the fleet and treat the worst scores as a backlog.

systemdLinuxcgroupsBashReliability

Comments

Keep Reading