Why Idempotency Matters: The Five Failure Modes of Non-Idempotent Scripts
You wrote a setup script. It worked. You run it again to apply a small config tweak — and it fails because the user already exists, the cron entry got duplicated, the symlink threw File exists, or the database migration ran twice and corrupted a counter. You add 2>/dev/null || true to suppress the errors, ship it, and now the script “succeeds” whether or not it actually did anything.
This is the most common operational tax in shell automation. Idempotency is the property that running a script N times produces the same end state as running it once. It is not a coding style; it is a contract with the runtime: “I describe what should be true. The script makes it true if it isn’t, and does nothing if it already is.”
The five failure modes you eliminate by building idempotency in from the start:
| Failure mode | Symptom | Real-world consequence |
|---|---|---|
| Duplicate appends | Cron has the same line three times after three runs | Job runs three times concurrently, locks fight, alerts fire |
| First-run-only logic | useradd succeeds once, fails forever after |
Deploy pipeline goes red on the second run; team disables the check |
| Mid-run partial state | Script crashes between step 4 and step 5; rerunning fails because step 4 state is “already there” | Manual cleanup tickets every time CI flakes |
| Hidden drift | Config file matches manifest “by accident” — no one knows what controls it | Security audit finds /etc/sudoers.d/admin no one remembers writing |
| Re-run dread | Engineers refuse to re-run the script in prod because “last time it broke things” | Scripts become single-use; institutional knowledge dies; everything is manual |
The fix is structural, not cosmetic. Idempotency is built from four primitives that this lesson covers in depth:
- State-file markers — write a sentinel after every “did the work” branch so reruns see “already done.”
- Desired-vs-actual reconciliation — express what should be true, query what is true, apply only the delta.
--dry-rundiscipline — every change-making function must support a no-op mode that prints what it would do.- Idempotent primitives — use
ln -sfn,install -m,mkdir -p,sed -iwith sentinels, and atomic file writes instead of their non-idempotent equivalents.
By the end, you’ll have a lib/state.sh you can source into any script to make it convergent.
Ground Rules: What “Idempotent” Actually Means in Shell
Before we write code, let’s pin down the definitions, because “idempotent” gets used loosely.
Strict idempotency
f(state) = f(f(state)) = f(f(f(state)))
The script is a function from system state to system state. Applying it twice equals applying it once. The script reads the world, computes the delta, and applies the minimum set of changes.
Convergent idempotency (what we usually build)
f(state) → desired_state, regardless of starting state
Slightly weaker but more practical: the script doesn’t care what state it started from; it reaches desired_state whether the system was empty, partially configured, or fully configured. Configuration management tools (Ansible, Puppet, Chef) are convergent.
What idempotency is NOT
- It is not “the script doesn’t crash on rerun.” Suppressing errors with
|| trueis the opposite of idempotent — it hides the fact that you no longer know what happened. - It is not “the script is safe.” A
rm -rf /var/cache/myappis technically idempotent (the directory stays gone) but destroying running state on every rerun is not safe. - It is not “Ansible’s job.” If your shell script wraps
terraform apply, calls a REST API, or runs a database migration, it must be idempotent at its layer, regardless of what the upstream tool does.
The mental model: every change-making function is a state transition
Every function that changes the system answers four questions:
- What is the desired state? (The argument:
ensure_user "deploy" "deploy" "/home/deploy".) - What is the current state? (Read it:
id deploy 2>/dev/null.) - Is current == desired? (Compare; if yes, return without changing anything.)
- If not, apply the minimal change to transition. (Run
useraddorusermod.)
This four-step pattern is the heart of every idempotent function in this lesson.
The Four Pillars: State Markers, Reconciliation, Dry-Run, Primitives
Pillar 1 — State-file markers (the “I already did this” sentinel)
The simplest idempotency primitive: write a marker file when you complete a one-shot, expensive, or non-queryable action. On rerun, check for the marker and skip if present.
#!/usr/bin/env bash
set -Eeuo pipefail
STATE_DIR="/var/lib/myapp/state"
mkdir -p "$STATE_DIR"
# One-shot: download a 2 GB model file. We can check the file's existence,
# but we also want to record WHEN and WHICH version was installed, so a
# marker with metadata is more useful than just `[[ -f /path/to/file ]]`.
ensure_model_v2() {
local marker="$STATE_DIR/model-v2.installed"
if [[ -f "$marker" ]]; then
log "model-v2 already installed at $(cat "$marker")"
return 0
fi
log "downloading model-v2..."
curl -fsSL https://example.com/model-v2.bin -o /opt/myapp/model.bin
echo "$(date -u +%FT%TZ) sha256=$(sha256sum /opt/myapp/model.bin | awk '{print $1}')" >"$marker"
log "model-v2 installed"
}
Why this works:
- The marker is idempotent across crashes: if
curlfails, no marker is written, so the next run retries from scratch. - The marker is auditable: it records when and what version, so later you can ask “when did we install v2?” without git-blame archaeology.
- The marker is cheap: file existence check is a single
statsyscall; it’s faster than re-validating a 2 GB file’s checksum on every run.
State-file rules:
- Write the marker last. Never write the marker before the work succeeds; otherwise a crash mid-work leaves a marker that lies about state.
- Include version/identity in the marker name.
model-v2.installed, notmodel.installed— the v2 marker won’t satisfy a future “ensure v3” check. - Keep markers under one well-known directory.
/var/lib/myapp/statefor system services,~/.local/state/myappfor user scripts,$STATE_DIRfor tests. Easy to clear, easy to inspect, easy to back up. - Don’t put markers in
/tmp./tmpgets cleared on reboot on many distros (systemd-tmpfiles); your “already installed” claim evaporates.
Pillar 2 — Desired-vs-actual reconciliation (drift detection)
State markers work for one-shot operations. For ongoing config (a user, a file, a sysctl value, a cron entry), markers are wrong because the config could drift after the marker was written. Someone could delete the user, edit the file, override the sysctl. The script needs to read the current state on every run and reconcile.
# The reconciliation pattern, applied to a user account.
ensure_user() {
local user="$1" home="$2" shell="${3:-/bin/bash}"
local current_home current_shell
if id -u "$user" &>/dev/null; then
# Read actual state.
current_home=$(getent passwd "$user" | cut -d: -f6)
current_shell=$(getent passwd "$user" | cut -d: -f7)
# Compare. If everything matches, no-op.
if [[ "$current_home" == "$home" && "$current_shell" == "$shell" ]]; then
log "user $user already correct"
return 0
fi
# Apply the delta only.
log "user $user exists but has drifted; applying $home, $shell"
[[ "$current_home" != "$home" ]] && usermod --home "$home" --move-home "$user"
[[ "$current_shell" != "$shell" ]] && usermod --shell "$shell" "$user"
else
log "creating user $user"
useradd --home-dir "$home" --shell "$shell" --create-home "$user"
fi
}
The four-step reconciliation pattern, made explicit:
┌─────────────────────────────────────────────┐
│ 1. Desired state from arguments │ ensure_user "deploy" "/home/deploy" "/bin/bash"
└─────────────┬───────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ 2. Read actual state from the system │ getent passwd deploy
└─────────────┬───────────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ 3. Compute delta (desired - actual) │ home differs? shell differs? exists?
└─────────────┬───────────────────────────────┘
▼
┌──────────────┴──────────────────────────────┐
▼ delta empty ▼ delta non-empty
"already correct" apply minimal change
return 0 (useradd or usermod)
This pattern scales. The same skeleton works for files (compare_file — checksum the desired vs actual, atomic-write if different), services (ensure_service_running — systemctl is-active, then systemctl start if needed), packages (ensure_pkg_installed — dpkg-query then apt-get install), and remote API state (ensure_dns_record — query the API, then PATCH only the changed fields).
Pillar 3 — Dry-run discipline (the no-op mode)
Every change-making function must support --dry-run. Two reasons:
- Operator safety. Before applying a script in production, the operator wants to see the diff: “what would this change?” A dry-run that prints “would create user deploy, would write /etc/cron.d/myjob, would not change /etc/hosts” is the audit log before the fact.
- CI integration. PR checks should run the script against a snapshot of prod state with
--dry-runand fail if the diff is unexpected.
The pattern: a global DRY_RUN=0 flag, and a wrapper do_or_say that gates every mutating call.
DRY_RUN=0
# Parse args; $1 is "--dry-run" if present.
[[ "${1:-}" == "--dry-run" ]] && { DRY_RUN=1; shift; }
# Wrapper: in dry-run, print the command instead of running it.
do_or_say() {
if (( DRY_RUN )); then
printf 'WOULD: %s\n' "$*" >&2
else
"$@"
fi
}
ensure_user_dry() {
local user="$1" home="$2"
if id -u "$user" &>/dev/null; then
log "user $user exists"
else
do_or_say useradd --home-dir "$home" --create-home "$user"
fi
}
Dry-run rules:
- Reads stay live.
id -u,getent,systemctl is-active,curl GET— all run normally in dry-run. The point is to see what changes would happen given the current real state. - Writes are gated. Every
useradd,usermod,cp,chmod,systemctl start,curl POST— all wrapped indo_or_say. - Output is unambiguous. Print
WOULD: useradd ...not# useradd ...(looks like a comment); notuseradd ... # skipped(parsers will run it). TheWOULD:prefix is greppable:myscript --dry-run | grep ^WOULD. - Dry-run must reach the same code paths as live. If
--dry-runskips entire sections of the script, you’re testing nothing. Thedo_or_saywrapper means the control flow is identical; only the side-effecting commands are gated.
Pillar 4 — Idempotent primitives (use the right tool)
Many shell tools have idempotent and non-idempotent variants. Use the idempotent one by default, even when you “know” the script will only run once.
| Non-idempotent | Idempotent equivalent | Why |
|---|---|---|
ln -s target link |
ln -sfn target link |
-f overwrites, -n treats existing symlinks as files (doesn’t create link inside dir) |
mkdir /opt/myapp |
mkdir -p /opt/myapp |
-p succeeds if dir exists; creates parents |
cp file /etc/myconf |
install -m 0644 -o root -g root file /etc/myconf |
install sets perms+owner; idempotent w.r.t. mode/ownership drift |
echo 'line' >> file |
sentinel + sed -i (see below) |
>> appends every run; sentinel detects existing |
useradd user |
`id -u user | |
groupadd grp |
getent group grp >/dev/null || groupadd grp |
same pattern |
git clone url dir |
[[ -d dir/.git ]] && git -C dir pull || git clone url dir |
clone fails if dir exists; reconcile via pull |
rm /tmp/foo |
rm -f /tmp/foo |
-f doesn’t fail if missing; idempotent removal |
kill $pid |
kill $pid 2>/dev/null || true |
better: check if running first |
The append-only-if-missing pattern (sentinel comments):
# Add a line to a file only if it's not already there. The sentinel comment
# uniquely identifies our line so we can find it on rerun.
ensure_line_in_file() {
local file="$1" line="$2" sentinel="$3"
# sentinel example: "# managed-by:myapp:cron-job"
if grep -qF "$sentinel" "$file" 2>/dev/null; then
# Line is already there. Update it in place if content differs.
local current
current=$(grep -F "$sentinel" "$file" | head -n1)
if [[ "$current" != "$line" ]]; then
# Use a delimiter unlikely to appear in $line; '|' is common.
sed -i.bak "/$(printf '%s' "$sentinel" | sed 's:[][\\/.^$*]:\\&:g')/c\\
$line" "$file"
fi
else
printf '%s\n' "$line" >>"$file"
fi
}
# Usage:
ensure_line_in_file /etc/cron.d/myapp \
"*/5 * * * * deploy /opt/myapp/bin/heartbeat # managed-by:myapp:heartbeat" \
"managed-by:myapp:heartbeat"
Atomic file writes (the mv-into-place pattern):
# Idempotent file generation. Compare desired content to current, write only on diff.
ensure_file_content() {
local path="$1" mode="$2" owner="$3" group="$4"
local desired_content
desired_content=$(cat) # read from stdin
# Compare.
if [[ -f "$path" ]] && diff -q <(printf '%s' "$desired_content") "$path" >/dev/null; then
log "$path: content unchanged"
return 0
fi
# Write atomically: temp file in same directory, then rename.
local tmp
tmp=$(mktemp "${path}.XXXXXX")
printf '%s' "$desired_content" >"$tmp"
install -m "$mode" -o "$owner" -g "$group" "$tmp" "$path"
rm -f "$tmp" # install copies; remove the temp
log "$path: updated"
}
# Usage:
ensure_file_content /etc/myapp/server.conf 0644 root root <<'EOF'
listen_port = 8080
log_level = info
EOF
Why atomic writes: if the script crashes mid-write with >"$path", the file is truncated and the service that reads it sees garbage. Writing to ${path}.XXXXXX and renaming is atomic on POSIX filesystems — readers see either the old file or the new one, never a partial.
A Drop-In Library: lib/state.sh
Compose the patterns above into a reusable library. Source it from any idempotent script.
# lib/state.sh — drop-in idempotency helpers.
# Source from any script: . /opt/myapp/lib/state.sh
# ─── Configuration (override before sourcing if needed) ────────────────────
: "${STATE_DIR:=/var/lib/$(basename "${0##*/}" .sh)/state}"
: "${DRY_RUN:=0}"
: "${LOG_PREFIX:=$(basename "${0##*/}" .sh)}"
# ─── Logging (timestamped, structured-ish) ─────────────────────────────────
log() { printf '[%s] [%s] [INFO] %s\n' "$(date -u +%FT%TZ)" "$LOG_PREFIX" "$*" >&2; }
warn() { printf '[%s] [%s] [WARN] %s\n' "$(date -u +%FT%TZ)" "$LOG_PREFIX" "$*" >&2; }
err() { printf '[%s] [%s] [ERROR] %s\n' "$(date -u +%FT%TZ)" "$LOG_PREFIX" "$*" >&2; }
chg() { printf '[%s] [%s] [CHANGE] %s\n' "$(date -u +%FT%TZ)" "$LOG_PREFIX" "$*" >&2; }
# ─── Dry-run wrapper ───────────────────────────────────────────────────────
# Run the command, or print "WOULD: ..." in dry-run mode.
do_or_say() {
if (( DRY_RUN )); then
printf 'WOULD: %s\n' "$*" >&2
return 0
fi
"$@"
}
# ─── Init ──────────────────────────────────────────────────────────────────
state_init() {
if (( ! DRY_RUN )); then
mkdir -p "$STATE_DIR"
chmod 0700 "$STATE_DIR"
fi
}
# ─── State markers ─────────────────────────────────────────────────────────
state_marker_path() { printf '%s/%s.marker' "$STATE_DIR" "$1"; }
state_marker_exists() { [[ -f "$(state_marker_path "$1")" ]]; }
state_marker_set() {
local name="$1" content="${2:-$(date -u +%FT%TZ)}"
do_or_say bash -c "umask 077; printf '%s\n' '$content' >'$(state_marker_path "$name")'"
}
state_marker_clear() {
do_or_say rm -f "$(state_marker_path "$1")"
}
# Run $@ once-and-only-once across reruns; mark with $name.
state_once() {
local name="$1"; shift
if state_marker_exists "$name"; then
log "skip ($name): already done at $(cat "$(state_marker_path "$name")" 2>/dev/null || echo unknown)"
return 0
fi
log "running ($name): $*"
if "$@"; then
state_marker_set "$name"
chg "completed: $name"
else
err "failed: $name (no marker written; will retry next run)"
return 1
fi
}
# ─── Idempotent primitives ─────────────────────────────────────────────────
# Ensure a directory exists with mode/owner/group.
ensure_dir() {
local path="$1" mode="${2:-0755}" owner="${3:-root}" group="${4:-root}"
if [[ ! -d "$path" ]]; then
do_or_say install -d -m "$mode" -o "$owner" -g "$group" "$path"
chg "created dir $path ($mode $owner:$group)"
else
# Reconcile mode/owner/group.
local cur_mode cur_owner cur_group
cur_mode=$(stat -c %a "$path" 2>/dev/null || stat -f %A "$path")
cur_owner=$(stat -c %U "$path" 2>/dev/null || stat -f %Su "$path")
cur_group=$(stat -c %G "$path" 2>/dev/null || stat -f %Sg "$path")
[[ "$cur_mode" != "${mode#0}" ]] && do_or_say chmod "$mode" "$path" && chg "chmod $path $mode"
[[ "$cur_owner" != "$owner" ]] && do_or_say chown "$owner" "$path" && chg "chown $path $owner"
[[ "$cur_group" != "$group" ]] && do_or_say chgrp "$group" "$path" && chg "chgrp $path $group"
fi
}
# Ensure a symlink target -> link, idempotent and no-prompt.
ensure_symlink() {
local target="$1" link="$2"
if [[ -L "$link" ]]; then
local current
current=$(readlink "$link")
[[ "$current" == "$target" ]] && return 0
chg "symlink $link drifted ($current -> $target)"
elif [[ -e "$link" ]]; then
err "$link exists and is not a symlink; refusing to clobber"
return 1
fi
do_or_say ln -sfn "$target" "$link"
chg "symlink $link -> $target"
}
# Ensure file content matches stdin; preserve mode/owner/group.
ensure_file() {
local path="$1" mode="${2:-0644}" owner="${3:-root}" group="${4:-root}"
local desired tmp
desired=$(cat)
if [[ -f "$path" ]] && diff -q <(printf '%s' "$desired") "$path" >/dev/null 2>&1; then
# Content matches; reconcile mode/owner only.
local cur_mode cur_owner cur_group
cur_mode=$(stat -c %a "$path" 2>/dev/null || stat -f %A "$path")
cur_owner=$(stat -c %U "$path" 2>/dev/null || stat -f %Su "$path")
cur_group=$(stat -c %G "$path" 2>/dev/null || stat -f %Sg "$path")
[[ "$cur_mode" != "${mode#0}" ]] && do_or_say chmod "$mode" "$path" && chg "chmod $path $mode"
[[ "$cur_owner" != "$owner" ]] && do_or_say chown "$owner" "$path" && chg "chown $path $owner"
[[ "$cur_group" != "$group" ]] && do_or_say chgrp "$group" "$path" && chg "chgrp $path $group"
return 0
fi
if (( DRY_RUN )); then
printf 'WOULD: write %s (mode %s owner %s:%s, %d bytes)\n' \
"$path" "$mode" "$owner" "$group" "${#desired}" >&2
return 0
fi
# Atomic write.
tmp=$(mktemp "${path}.XXXXXX")
trap 'rm -f "$tmp"' EXIT
printf '%s' "$desired" >"$tmp"
install -m "$mode" -o "$owner" -g "$group" "$tmp" "$path"
rm -f "$tmp"
trap - EXIT
chg "wrote $path ($mode $owner:$group, ${#desired} bytes)"
}
# Ensure a line is in a file, identified by a sentinel comment.
ensure_line() {
local file="$1" line="$2" sentinel="$3"
if [[ ! -f "$file" ]]; then
do_or_say bash -c "printf '%s\n' '$line' >'$file'"
chg "created $file with $sentinel"
return 0
fi
if grep -qF "$sentinel" "$file"; then
local current
current=$(grep -F "$sentinel" "$file" | head -n1)
if [[ "$current" == "$line" ]]; then
return 0 # already correct
fi
# Replace the line containing the sentinel.
if (( DRY_RUN )); then
printf 'WOULD: replace line in %s containing "%s" with "%s"\n' "$file" "$sentinel" "$line" >&2
return 0
fi
local escaped_sentinel
escaped_sentinel=$(printf '%s' "$sentinel" | sed 's:[][\\/.^$*]:\\&:g')
sed -i.bak "/$escaped_sentinel/c\\
$line" "$file"
rm -f "${file}.bak"
chg "updated line in $file ($sentinel)"
else
do_or_say bash -c "printf '%s\n' '$line' >>'$file'"
chg "appended line to $file ($sentinel)"
fi
}
# Ensure a user exists with home and shell.
ensure_user() {
local user="$1" home="$2" shell="${3:-/bin/bash}"
if id -u "$user" &>/dev/null; then
local cur_home cur_shell
cur_home=$(getent passwd "$user" | cut -d: -f6)
cur_shell=$(getent passwd "$user" | cut -d: -f7)
[[ "$cur_home" != "$home" ]] && do_or_say usermod --home "$home" --move-home "$user" && chg "user $user home -> $home"
[[ "$cur_shell" != "$shell" ]] && do_or_say usermod --shell "$shell" "$user" && chg "user $user shell -> $shell"
else
do_or_say useradd --home-dir "$home" --shell "$shell" --create-home "$user"
chg "created user $user"
fi
}
# Ensure a systemd service is enabled and running.
ensure_service() {
local svc="$1" state="${2:-running}" # running | stopped
local enabled="${3:-enabled}" # enabled | disabled
case "$enabled" in
enabled)
systemctl is-enabled "$svc" &>/dev/null || \
{ do_or_say systemctl enable "$svc"; chg "enabled $svc"; }
;;
disabled)
systemctl is-enabled "$svc" &>/dev/null && \
{ do_or_say systemctl disable "$svc"; chg "disabled $svc"; } || true
;;
esac
case "$state" in
running)
systemctl is-active "$svc" &>/dev/null || \
{ do_or_say systemctl start "$svc"; chg "started $svc"; }
;;
stopped)
systemctl is-active "$svc" &>/dev/null && \
{ do_or_say systemctl stop "$svc"; chg "stopped $svc"; } || true
;;
esac
}
Real-World Recipes
Recipe 1: Idempotent package installation across distros
# Source lib/state.sh first.
ensure_pkg() {
local pkg="$1"
if command -v dpkg-query &>/dev/null; then
if dpkg-query -W -f='${Status}\n' "$pkg" 2>/dev/null | grep -q "ok installed"; then
return 0
fi
do_or_say apt-get install -y "$pkg"
chg "installed $pkg (apt)"
elif command -v rpm &>/dev/null; then
if rpm -q "$pkg" &>/dev/null; then
return 0
fi
do_or_say dnf install -y "$pkg" 2>/dev/null || do_or_say yum install -y "$pkg"
chg "installed $pkg (rpm)"
else
err "no supported package manager"; return 1
fi
}
ensure_pkg curl
ensure_pkg jq
ensure_pkg "$([[ "$(uname -s)" == Linux ]] && echo iproute2 || echo iproute)"
Recipe 2: Idempotent cron entry with sentinel
ensure_dir /etc/cron.d 0755 root root
ensure_line /etc/cron.d/myapp \
'*/5 * * * * deploy /opt/myapp/bin/heartbeat >/dev/null 2>&1 # managed-by:myapp:heartbeat' \
'managed-by:myapp:heartbeat'
# Removing a managed cron entry is also idempotent:
remove_line() {
local file="$1" sentinel="$2"
[[ -f "$file" ]] || return 0
grep -qF "$sentinel" "$file" || return 0
do_or_say sed -i.bak "/$sentinel/d" "$file"
rm -f "${file}.bak"
chg "removed line from $file ($sentinel)"
}
Recipe 3: Idempotent remote API state (DNS record)
# Reconcile a DNS A record via Cloudflare API.
ensure_dns_a_record() {
local zone_id="$1" name="$2" desired_ip="$3" cf_token="${CLOUDFLARE_API_TOKEN:?}"
# 1. Read current state.
local current_json current_id current_ip
current_json=$(curl -fsS \
-H "Authorization: Bearer $cf_token" \
"https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records?name=$name&type=A")
current_id=$(jq -r '.result[0].id // empty' <<<"$current_json")
current_ip=$(jq -r '.result[0].content // empty' <<<"$current_json")
# 2. Compare.
if [[ -n "$current_id" && "$current_ip" == "$desired_ip" ]]; then
log "$name -> $desired_ip already correct"
return 0
fi
# 3. Apply minimal delta.
if [[ -z "$current_id" ]]; then
# Create.
if (( DRY_RUN )); then
printf 'WOULD: POST DNS record %s A %s\n' "$name" "$desired_ip" >&2
else
curl -fsS -X POST \
-H "Authorization: Bearer $cf_token" -H "Content-Type: application/json" \
"https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records" \
--data "$(jq -nc --arg n "$name" --arg ip "$desired_ip" \
'{type:"A",name:$n,content:$ip,ttl:300}')" >/dev/null
chg "created DNS $name A $desired_ip"
fi
else
# Update.
if (( DRY_RUN )); then
printf 'WOULD: PATCH DNS %s A from %s to %s\n' "$name" "$current_ip" "$desired_ip" >&2
else
curl -fsS -X PATCH \
-H "Authorization: Bearer $cf_token" -H "Content-Type: application/json" \
"https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records/$current_id" \
--data "$(jq -nc --arg ip "$desired_ip" '{content:$ip}')" >/dev/null
chg "updated DNS $name A: $current_ip -> $desired_ip"
fi
fi
}
ensure_dns_a_record "$ZONE_ID" "api.example.com" "203.0.113.42"
The four-step pattern is identical to local file/user reconciliation: read, compare, compute delta, apply minimum. The only difference is the API instead of a syscall.
Recipe 4: Database migration with idempotent runner
# Run SQL migration files only once each, tracked in a migrations table.
# Uses Postgres; the same shape works for any DB.
PGCONN="${PGCONN:-postgresql://app:secret@localhost/app_prod}"
ensure_migrations_table() {
state_once db.migrations_table psql "$PGCONN" -v ON_ERROR_STOP=1 -c "
CREATE TABLE IF NOT EXISTS schema_migrations (
id TEXT PRIMARY KEY,
applied_at TIMESTAMPTZ NOT NULL DEFAULT now(),
checksum TEXT NOT NULL
);"
}
apply_migration() {
local file="$1"
local id checksum
id=$(basename "$file" .sql)
checksum=$(sha256sum "$file" | awk '{print $1}')
# Already applied?
local applied
applied=$(psql "$PGCONN" -tAc \
"SELECT checksum FROM schema_migrations WHERE id='$id';" 2>/dev/null || true)
if [[ -n "$applied" ]]; then
if [[ "$applied" != "$checksum" ]]; then
err "migration $id already applied with different checksum ($applied vs $checksum)"
return 1
fi
log "migration $id already applied"
return 0
fi
# Apply in a transaction; record on success.
if (( DRY_RUN )); then
printf 'WOULD: apply migration %s (%s)\n' "$id" "$checksum" >&2
return 0
fi
psql "$PGCONN" -v ON_ERROR_STOP=1 --single-transaction <<EOF
\i $file
INSERT INTO schema_migrations (id, checksum) VALUES ('$id', '$checksum');
EOF
chg "applied migration $id"
}
ensure_migrations_table
for f in /opt/myapp/migrations/*.sql; do
apply_migration "$f"
done
Why this is idempotent:
- The
schema_migrationstable is the durable state (the database itself, not a file). - Re-running iterates all files; already-applied ones are no-ops.
- Checksum comparison catches a class of bugs: someone edited a migration file after it was applied. The script refuses to re-run silently and surfaces the drift.
--single-transactionensures the migration and the marker insert succeed atomically; a crash mid-migration leaves no marker, so the next run retries cleanly.
Visual: The Reconciliation Lifecycle
┌────────────────────────────────────────────────────┐
│ SCRIPT INVOCATION │
│ ./apply.sh or ./apply.sh --dry-run │
└────────────────────┬───────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ Source lib/state.sh │
│ Init STATE_DIR, parse flags │
└─────────────┬────────────────┘
│
▼
┌───────────────────┴───────────────────┐
│ For each ensure_* (the desired │
│ state declarations): │
│ │
│ ┌─────────────────────────────────┐ │
│ │ 1. Define desired (arguments) │ │
│ └─────────────┬───────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ 2. Read actual (system query) │ │
│ └─────────────┬───────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ 3. Compute delta │ │
│ └─────────────┬───────────────────┘ │
│ ▼ │
│ ┌──────────┴──────────┐ │
│ ▼ delta empty ▼ non-empty│
│ no-op do_or_say │
│ (silent) CHANGE log │
│ │
└───────────────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ Summary: │
│ N actions taken │
│ M no-ops (already correct) │
│ K dry-run announcements │
└──────────────────────────────┘
The 12-Item Idempotency Footgun List
After reviewing dozens of “almost-idempotent” scripts in production, these are the patterns that fail in practice:
-
The “marker before work” bug. Writing the state file before the operation succeeds, so a crash leaves a marker that lies. Always write the marker as the last step.
-
mkdirwithout-p. First run succeeds, second run fails with “File exists.” Alwaysmkdir -p, even when you “know” the dir doesn’t exist. -
echo "line" >> filewithout sentinel. Each rerun appends another copy. Always pair appends with a sentinel-comment guard. -
useraddwithout anid -uguard. Fails on rerun with “user already exists.” Always read first. -
ln -swithout-fn. First run creates the link; second run fails. Worse, if the link target is a directory and-nis missing,ln -sfcreates a link inside the directory. Alwaysln -sfn. -
set -emasking idempotency holes.useradd userfails because user exists;set -eaborts the script. The author “fixes” it withuseradd user || true, hiding the failure. The right fix is to read first. -
sed -iwithout backup file disposal.sed -i.bakleaves.bakfiles everywhere on every run. Eithersed -i(no backup, GNU only) orsed -i.bakfollowed byrm -f file.bak. Note:sed -isyntax differs between GNU sed and BSD sed; on macOS usesed -i ''orsed -i.bakwith cleanup. -
Dry-run that doesn’t reach all code paths. If
if (( DRY_RUN )); then return; fiis at the top of every function, you’re testing nothing. Use ado_or_saywrapper around mutating commands; reads always run. -
State stored in
/tmp.systemd-tmpfilesmay clean/tmpon reboot or schedule. Your “already done” marker disappears, the script re-runs the work, and you get duplicate side effects. Use/var/lib/myapp/statefor system state. -
Markers in different namespaces collide. Two scripts both write
/var/lib/state/done; one clobbers the other’s state. Always namespace with the script name:$STATE_DIR/myapp/v2-installed. -
Reconciliation that compares wrong fields. “Does the user exist?” →
id -usays yes. “Is the home dir right?” → never checked. The user drifted. Reconcile every field you declare desired state for. -
Idempotency that depends on shared mutable state. Two scripts both manage
/etc/sysctl.d/99-tuning.conf. They each write their own line, but on conflict, the last one wins. The reconciliation pattern only protects what one script owns. For multi-owner files, use/etc/sysctl.d/<owner>-<name>.confper owner.
Quick-Reference Card
┌─ THE FOUR-STEP RECONCILIATION ────────────────────────────────────────┐
│ 1. desired = arguments "I want X to be true" │
│ 2. actual = read system "What is true now?" │
│ 3. delta = desired - actual "What's different?" │
│ 4. apply minimal change "Move only the delta" │
└────────────────────────────────────────────────────────────────────────┘
┌─ IDEMPOTENT PRIMITIVES (use these by default) ────────────────────────┐
│ mkdir -p DIR idempotent dir create │
│ ln -sfn TARGET LINK idempotent symlink │
│ install -m MODE -o U -g G S D idempotent file install w/ perms │
│ rm -f FILE idempotent removal │
│ id -u USER || useradd USER guarded user create │
│ getent group GRP || groupadd GRP guarded group create │
│ systemctl is-active SVC || start guarded service start │
└────────────────────────────────────────────────────────────────────────┘
┌─ STATE MARKERS ───────────────────────────────────────────────────────┐
│ Where: /var/lib/$APP/state/ (NOT /tmp) │
│ Name: <action>-<version>.marker (versioned) │
│ Content: ISO 8601 timestamp + identifying metadata │
│ When to write: AFTER the work succeeds, never before │
│ When to use: one-shot/expensive ops; NOT for ongoing config │
└────────────────────────────────────────────────────────────────────────┘
┌─ DRY-RUN DISCIPLINE ──────────────────────────────────────────────────┐
│ Reads run live (id, getent, stat, curl GET). │
│ Writes gated by do_or_say: in DRY_RUN=1, prints "WOULD: cmd". │
│ Same code paths as live mode; only side effects differ. │
│ Output is greppable: `script --dry-run | grep ^WOULD`. │
└────────────────────────────────────────────────────────────────────────┘
┌─ APPEND-ONLY-IF-MISSING ──────────────────────────────────────────────┐
│ grep -qF "$sentinel" "$file" && update_in_place || append │
│ Sentinel = unique comment: "# managed-by:$APP:$RULE" │
│ Survives reformatting; greppable in audit │
└────────────────────────────────────────────────────────────────────────┘
┌─ ATOMIC FILE WRITE ───────────────────────────────────────────────────┐
│ tmp=$(mktemp "$path.XXXXXX") │
│ printf '%s' "$content" >"$tmp" │
│ install -m MODE -o U -g G "$tmp" "$path" │
│ rm -f "$tmp" │
│ Reader sees old or new file, never partial. │
└────────────────────────────────────────────────────────────────────────┘
What’s Next
Idempotency is the discipline; filesystem semantics is what your idempotent operations rest on. The next lesson, Filesystem Semantics: Hard Links, Symlinks, Mount Namespaces & fsync Discipline, drills into the mechanics that determine whether your atomic writes are actually atomic, why ln -sfn behaves differently across filesystems, when mv is atomic and when it isn’t (cross-device renames!), and how fsync discipline turns “I wrote it to disk” into a guarantee instead of a hope. Together with this lesson, those primitives let you build scripts that converge correctly even on flaky filesystems and crash-prone hosts.